Latin America: Researcher access to platform data, challenges to academic freedom and transparency
Social media monitoring is essential for research on online violence. Since 2020, InternetLab, in partnership with AzMina and Núcleo Jornalismo, has been conducting MonitorA, an observatory of gender-based political violence on social media. MonitorA involves collecting and analyzing comments on Facebook, Instagram, YouTube, and Twitter/X during electoral periods in Brazil to understand how online gender-based violence operates against female candidates. In addition, it allows us to assess the impacts of online violence on the quality of Brazilian democracy and women's participation in politics.
However, the continuity of this research has been facing setbacks due to what we call data blackout, caused by changes and restrictions in accessing the APIs of different platforms. Few platforms remained open for research after the closure of CrowdTangle, Meta’s research platform, in 2024; the exorbitant fees charged by X; and TikTok’s API being available, though unreliable, only in the United States and Europe. As a result, the historical series of MonitorA has become unviable, leaving no comparable data for the analysis of political violence in Brazil.
Global disparities in data access and academic collaboration
The case of MonitorA is not unique. Other research projects have been discontinued or compromised due to data access policies for researchers. Although some of these policies affect researchers worldwide, the impacts are different and more significant in countries in the Global South due to the absence of alternatives beyond the APIs provided by the platforms. In some cases, researchers from the Global North gain access to data through alternative means, such as qualitative methods via interviews or informal conversations with big tech employees, or through partnerships with platforms, not relying exclusively on access to public APIs.
There are numerous records of partnerships between US and European universities and technology companies, specifically aimed at conducting academic research, in which social media platforms offer benefits, such as data sharing and promoting dialogue between the company and researchers. Such partnerships were not found in peer universities in Latin America, Africa or Asia. For example, Meta used to support universities in the United States, such as New York University, Arizona State University, and Ryerson University, through research groups and laboratories under the Academic Partnerships and Data for Good at Meta project. In the same Meta project, universities in Europe including the London School of Economics, University of Catalunya, Mercator Research Center, and Max Planck Institute, were also involved.
This type of partnership between platforms and universities—or even direct contact with platforms—is much more restricted for researchers who are not affiliated with research centres in the Global North. Considering the context of a data blackout and the specific challenges faced by researchers from the Global South, during April, May, and June 2023, InternetLab conducted 3 in-depth interviews and 2 focus groups with 14 Latin American researchers to understand the specific challenges encountered in the region regarding academic freedom and transparency on platforms.
Breaking down the issue: 7 main obstacles for researchers
From these conversations, the main obstacles and risks faced by Latin American researchers conducting research on platforms were:
-
Infrastructure and funding: structural difficulties in Latin America for the collection and processing of research data. Despite relatively high economic classifications, research investments are low compared to the Global North, leading to fewer publications and citations. Latin American researchers often rely on platform APIs and tools like CrowdTangle because they lack the resources for advanced data scraping and storage, making them vulnerable to policy changes by the platforms. This dependence not only hampers independent research but also fosters a competitive environment where data sharing is limited by contractual and security concerns.
-
API policy changes and database erasure. Constant modifications in API policies (such as X policy changes or the closing of CrowndTangle) force researchers to adapt quickly, often outpacing academic timelines and hindering the collection of retroactive data. For instance, one researcher noted unpredictable shifts in Facebook’s data-sharing protocols, which compounded difficulties in storing and retrieving essential information. These issues, combined with limited funding and infrastructure, place Latin American scholars at a considerable disadvantage compared to their Global North counterparts.
-
Filters and quantity of databases made available. According to our interviews, platforms filter and limit the data available through their APIs, leaving researchers without complete datasets or the ability to request modifications to the filtering process. Furthermore, API access keys restrict the amount of data that can be retrieved, deepening dependency on the platforms’ willingness to provide data. Once again, although these limitations are not limited to Latin American researchers, they have a greater impact on the region when we consider the other obstacles.
-
The quality of the data made available and the possibility of cross-referencing data from different platforms. Platforms often filter their data for APIs based on commercial interests instead of research purposes. This results in incomplete or biased datasets that limit cross-platform research. Moreover, the diverse formats and aggregation methods employed by different platforms hinder cross-platform analysis, while language barriers further complicate access to key documents like terms of use.
-
Legal liability: treatment of personal data and strategies employed by researchers to ensure compliance with legal requirements regarding data processing. The legal requirements regarding data processing vary from country to country. In countries that do have a set of data protection rules, the interviews showed that there is concern among researchers about how to comply with such laws, specifically regarding ensuring proper anonymization of data and treating data in a way that individuals being researched are not identified. One point of concern among our participants is that these risks are not limited to researchers who may violate data protection laws. Rather, it primarily poses a risk to the users of the platforms involved in the research, who may have their privacy violated. Additionally, there are countries that do not have a data protection framework, leaving both researchers and platform users in a limbo of uncertainty.
-
Personal risk: political violence and psychological damages. Interviews pointed out that researchers face significant risks related to physical and psychological well-being, political pressure, and exposure to violent content. For instance, those studying messaging groups or extremist communities encounter threats, retaliation, and potential doxxing. Exposure to discriminatory and violent material also impacts researchers' mental health, particularly those from marginalized groups. Additionally, scholars report concerns over government or law enforcement attempts to access research data, raising ethical and security challenges. These factors highlight the precarious conditions under which research on sensitive topics is conducted.
-
Lack of ethical standards for research on social media. Researchers mentioned facing ethical dilemmas about collecting and handling certain types of data, such as personal data and private data. For instance, they are affected by the lack of protocols for collecting information, as well as the shortage of methods for researching in messaging groups, as these are not considered public data and obtaining free and informed consent from users can be challenging.
The perception that the limitations of accessing data in Latin America are more extensive when compared to the United States and Europe is a consensual point among researchers from the region. As shown, the reasons for these contrasts between Global North and South are multiple and interconnected, involving economic inequalities across regions, structural aspects of universities in different areas, as well as limitations and variations in the ways of engaging with platforms.
Where do we go from here?
Two conclusions are taken from this scenario. First, the development and expansion of evidence-based academic research in Latin America on social media platforms require coordination among all the involved actors: universities, researchers themselves, and the platforms. Universities, research centers, and ethics committees have to update themselves on the challenges and nuances of conducting research on platforms, as the methods and necessary support differ from those used in research involving individuals or in the natural and exact sciences. Researchers also recognize the need to consolidate protocols and establish best practices for data protection and privacy safeguards to better request and address the datasets they have access to.
Finally, regarding platforms, beyond the necessity to formulate and implement new transparency practices specifically designed for the Global South, there is a demand to improve existing ones. Strengthening the transparency framework for researchers involves identifying barriers and strategies to mitigate them, developing training and capacity-building, standardizing the availability of data across platforms, and increasing engagement and presence in the region.