How to access data for social media monitoring: availability, limitations, and outlook
with contributions from Richard Kuchta, Bea Saab, and Lena-Maria Boswald
Introduction: A matter of access
Social media is integral to modern communication and democratic discourse. It reveals us to ourselves, documenting our emotions, opinions, political stances, and behaviors, albeit often to an extreme and distorted extent. Since the rise of social media, researchers from varied disciplines and institutions have sought to study it and its impact on, or reflection of, society.
And yet, researchers, particularly non-academic and independent researchers have struggled for meaningful research access to platform data. A variety of researchers can only serve public interest by varying available analyses and results by length of study, research scope, context, and language, to name a few.
After years of debate, the European Union’s Digital Services Act (DSA) entered into force on August 25th, 2023. It begins the work of regulating Big Tech, particularly social media platforms. Among other imperatives, the DSA addresses the existing information asymmetry between the public and VLOPs (very large online platforms) by compelling them to grant data access to researchers (see Article 40, Data access and scrutiny).
Importantly, VLOPs are no longer the arbiters of data access. Digital Service Coordinators (DSCs), designated by each Member State, are tasked to vet independent researchers for access. This separation from platform control will allow for research from a critical standpoint on the platforms’ design and the occurrence of ‘systemic risks’ (DSA Article 34).
This opens more opportunities for civil society to participate in quantitative social media research with fewer barriers to access. By analyzing the engagement, sentiment, and reach of content on social platforms, non-academic researchers can gauge ongoing and evolving public sentiment and empower policymakers to make informed decisions by responding actively to changing social dynamics.
Availability: What can you access and how?
The avenues of access fall under two basic categories:
- through crowd listening tools, such as CrowdTangle provided for Meta products, and
- application programming interfaces, or APIs. APIs require coding skills, but their use allows researchers more flexibility in the way they can gather data and to gather it more precisely.
All major social media platforms (Meta, Twitter/X, TikTok, YouTube, and Telegram) offer API access, though some have accessibility restrictions. TikTok, for instance, only offers an API for developers. A parallel TikTok API for researchers is only available in the United States, leaving European institutions in the dark.
Here is an overview of data access by each platform. See more details in DRI’s data access series.
Overview of data access by each platform
Limitations: What’s missing?
If the DSA is now in force, mandating access to data, everything is solved, right? Not so much. There are still limitations, ambiguity, and technicalities that need to be ironed out.
It is not yet clear how new researchers will be vetted. While the DSA has described the ultimate direction, the practical implementation is left to what is called a ‘Delegated Act’. In response, civil society has responded with recommendations and public petitions. At DRI, ours include providing access to all public data, regular updates to APIs, granting access to non-academic researchers, not just developers, and better-informed Terms of Service adhering to the DSA.
In the meantime, many platforms currently still limit the type of data accessible through their APIs, even data that is public in nature. The primary blind spots are features like the comment sections under posts (which notoriously tend to hold more problematic or illegal content than the post itself), profiles set to ‘public’ on Facebook, new features such as ‘Stories’. See below for a breakdown of the type of data you can access by platform.
Overview of the type of data accessible by platform
The biggest gap now comes from the newly minted X.com (the artist formerly known as Twitter). Earlier this year, X imposed a paywall for its API access which is prohibitively expensive (the lowest, and most limited tier in terms of quantity of data, is 100 dollars per month). This year X has left a trail of non-compliance both with the impending DSA and the voluntary Code of Practice on Disinformation, representing a stark contrast to the changing tide. The platform will likely face fines from the EU for its actions.
Outlook: What can you do now (including in terms of advocacy)?
The Delegated Act detailing the new provisions for vetting and securing access are expected to be released in early 2024. In the meantime, social media research is still a vital area of research.
If you are interested in improving research access and social media research on global online political discourse, sign up for our newsletter, The Digital Drop.