Ethical considerations

Open Knowledge Foundation

OKFN

Original post on 13.09.2024 by the Open Knowledge Foundation
last updated on 06.03.2025 by Cathleen Berger

About the Open Knowledge Foundation

The Open Knowledge Foundation (OKFN) is the world's ultimate reference in open digital infrastructures and the hub of the open movement. The organisation has been establishing open standards for the last 20 years and providing services, tools and training for governments, organisations and communities to adopt openness as a design principle. Its most popular product, CKAN, is the tool behind open data portals such as those of the US, Brazilian and Indian governments, and the UN refugee agency. OKFN is the technology arm of a wide range of institutions creating, managing and publishing open data.

We are present in more than 40 countries through the Open Knowledge Network, using advocacy, technology and training to unlock information, to create and share knowledge, enabling people to take action to achieve local impact and collaborate with like-minded communities worldwide. Some of OKFN's most historically important initiatives include the Global Open Data Index, Open Definition, Open Data Commons, School of Data and, most recently, Digital Public Infrastructure for Electoral Processes.

With the proliferation of social media platforms and their reflection on many aspects of people's everyday lives, the unprecedented abundance of personal data available for analysis can be a disturbing reality. Therefore, empowering researchers in data science and social sciences to work with this type of data ethically is crucial to ensure responsible and respectful use of personal information in the digital age. To address common challenges and strategies to mitigate the ethical risks associated with social media monitoring and data sharing projects, this chapter briefly outlines a set of data ethics recommendations to help researchers get started.

Anonymisation
Anonymise data to protect individuals' privacy by replacing personal identifiers with random codes.

Data Minimisation
Avoid collecting unnecessary or sensitive information by only collecting data directly relevant to your research purposes.

Data Retention
Delete data when it is no longer needed.

Safe Data Storage
Ensure proper encryption protocols are used during transmission and that collected data is stored securely.

Harm Mitigation
Consider social, psychological and local legal consequences that can harm individuals or communities with the results of research.

Advocacy
Use social media as a tool for responsible advocacy and minimise the data storage and collection about your advocacy and campaigns.

Research Funding Transparency
if you conduct funded research, be transparent about the funding sources and potential conflicts of interest that could influence it. An improved level of transparency will likely increase trustworthiness of the way proposals are handled and of the grant allocation system in general.

Context: Data availability, personal data, access rights

After two decades of a technological revolution resulting in unprecedented Internet connectivity and social media adoption, technology companies have become increasingly powerful and opaque in how they exercise such power. Despite having become critical to people’s lives, citizens have almost no vehicle to request information and data about how social media platforms operate, beyond the right to access data under GDPR.

Currently, there is no equivalent of access to information laws to request information and data sets shaping our digital lives on social platforms, as the data released by social media companies is limited. Only eventually, through research, whistleblowing, public hearings or legal discovery processes, such fragments of information come into the light.

When there is no accessible data, public interest research conducted by non-governmental organisations permits an understanding of how social media companies operate or how they shape specific topics, areas, or events. Social and digital activists often collect such data by open web scraping, crowdsourcing, or other innovative technology methods. However, such activities often confront them with legal questions and ethical dilemmas related to data collection. In this new context, researchers need to obtain and reuse social media data with ethical standards that prevent harm to individuals and communities.

A principle-based framework

In response to the dynamic nature of social media, this chapter introduces you to a principles-based approach to Social Media Monitoring. A set of principles that, when applied, provide a framework and a living process that allow you to mitigate risks by responding promptly to ethical challenges as they emerge. Initially, the framework aims to understand risks and create a robust foundation for addressing the multilayered complexities of handling social media data ethically in various contexts. This includes transparency about data collection practices, privacy concerns, bias and discrimination, and cultural diversity. Understanding that ethical challenges are complex is key to acknowledging that research design will naturally pinpoint ethical risks if some principles are applied to the process.

Next, the chapter includes a Key Risk Indicator (KRI) to identify potential harms at different stages of social media data projects that should be regularly monitored to trigger proactive responses. Conducting a risk control self-assessment and including corrective actions and adjustments to ethical frameworks and practices is essential to improve the safety and quality of social media data research. Finally, the framework provides an easy-to-understand principles table that can be applied to the real-time design, deployment and evaluation of projects involving social media monitoring. This framework can help researchers and activists working on social media monitoring clarify their implicit or explicit decisions when collecting data and help define an ethical approach to their work.

When monitoring social media, many researchers could potentially collect data in real-time first and only later consider the ethical implications or potential risks once the dataset is completed. An easy justification for such behaviour could be that the information is “out in the public” anyway and that its massive collection is valid, justified, and harmless. The first step towards more ethical data collection is challenging this implicit “scrape all the data” consideration many data collectors embrace.

The following list is not exhaustive but sheds light on interconnected and multilayered risks that monitoring social media can present across different contexts, and where research is not clearly delineated. As a researcher you should think through these implications before you start your monitoring efforts:

Human rights considerations

Right to Oblivion or Right to be Forgotten

The right to oblivion, or the right to be forgotten, is a concept rooted in Article 12 of the Universal Declaration of Human Rights, also recognised by courts in the EU, Argentina and The Philippines. It asserts that no one shall be subjected to arbitrary interference with their privacy, so individuals should be able to request the removal of personal data that is no longer necessary. Social media monitoring can challenge this right, as data collected may persist indefinitely, potentially haunting individuals long after the information loses relevance.

Vulnerability

Vulnerable populations, including minors, marginalised communities, individuals with limited digital literacy and people suffering from abuse, can be disproportionately affected by social media monitoring. They may lack the resources and knowledge to challenge the exploitation of their data and make informed choices, including the right to challenge the very basis of monitoring. It is important to note that vulnerability highly depends on local context. A person can become vulnerable because of local laws or due to culture, and a person may be vulnerable in one country but not another. Local context always needs to be considered.

Protection of Personal Data

Failing to comply with data protection regulations, such as the EU’s General Data Protection Regulation (GDPR), regarding the processing of personal data can lead to data breaches and misuse of individuals' personal information.

Non-Discrimination

Discriminatory practices in social media data collection or algorithmic bias can lead to unequal approaches based on race, gender, or religion and reinforce existing inequalities in certain groups.

Data Extractivism

The mass extraction of social media data involves collecting data from individuals or organisations without their informed or free and informed consent and can be linked to resource extraction. This extractive approach can lead to exploitation, as individuals become commodities in the data-driven economy, raising concerns around fairness and the concentration of power. The expression “data extractivism” establishes an analogy between information management and the mining industry, defining data as a raw material that can be extracted, commercialised, refined, processed, and transformed into other commodities with added value.

Power Dynamics

The concentration of data in the hands of a few can shift power dynamics in society and lead to creating data monopolies. This centralised control raises concerns about surveillance states and the potential for abuse of power, as well as the erosion of democratic principles as data owners can significantly influence markets, decision-making, and even public discourse.

Economic Harm

The vast amount of data collected from social media users has great economic value in the digital economy, leading to concerns about economic harm to underprivileged groups. Economic harm from unethical or malicious social media monitoring practices can raise questions about fairness and equity.

Is the data I am collecting ethical? There is no simple answer to this question, but you can adopt Key Risk Indicators (KRI) that present an easy way to help you identify potential harms. KRI are specific metrics that serve as early warning signs for social media monitoring that can be used when researchers design projects to track ethical risks associated with their projects.

Risk	Indicator	Impact
Opaque Data Usage	Full disclosure of documentation regarding data collection practices and purposes.	A low transparency rate may indicate an ethical risk related to the lack of openness about data usage.
Non-Compliance	Percentage of data where informed consent was obtained for collection.	A low compliance rate may indicate a potential ethical risk related to data collection practices.
Weak Data Accuracy	Qualitative analysis of data in terms of misinformation or manipulated content.	Weak data accuracy can indicate an ethical risk, especially when used in decision-making processes.
Violation of Data Rights	The number of disputes related to data ownership.	An increased ownership dispute rate can indicate ethical concerns about data rights.
Biased data	Number of occurrences of bias detected in data collection and analysis.	Repeated detection of bias can indicate ethical concerns regarding fairness and discrimination.

In a fast-changing scenario like the social media environment, new circumstances that are not contemplated can arise at any time, making it difficult to stay up to date when using a strict framework. Thus, a rules-based approach can quickly become outdated. A principles-based approach, however, can help you, as researchers, reduce the complexity of compliance. It also promotes collaboration between different stakeholders, which speeds up the process and empowers people to take ownership.

This set of key principles and ethical standards are aimed at guiding organisations and researchers in monitoring social media platform data to ensure its responsible and ethical use, respecting user privacy, and promoting transparency.

Principle	Issue	Process
Transparency	Am I being transparent about the purpose of social media monitoring and the data types I am collecting?	Inform users about how their data may be used and who has access to it. Apply FAIR Principles to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets.
Compliance	Am I respecting the relevant laws and regulations, including data privacy and intellectual property laws?	Ensure that your monitoring practices comply with contextualised laws and allow users to opt out of being monitored.
Data Accuracy	Am I ensuring that the information gathered from social media is credible and verifiable?	Avoid spreading misinformation by establishing an information-checking process before considering the data in the research. Data integrity and availability should also be addressed.
Explicit Consent	Am I seeking explicit permissions from users when collecting data that might be considered sensitive?	Determine how the data will be used, and respect users' decisions regarding data usage and sharing.
Data Minimization and Deletion	Am I ensuring that only personal data that is directly related to the research is collected?	Limit the collection of personal information to what is directly relevant and necessary to accomplish a specified purpose. Data that is not needed anymore should be deleted.
Bias Detection	Am I ensuring that social media monitoring and analysis do not lead to discrimination based on race, gender, religion or political angle?	Foster diversity and inclusion within your monitoring team. If using algorithms for data analysis, make them transparent and auditable.
Community Feedback	Am I being open to feedback from the community regarding potential monitoring activities?	Actively seek input by providing accessible channels for users to voice their concerns, suggestions, and questions. Incorporate feedback when necessary.
Accountability	Am I regularly reviewing and auditing the monitoring practices to ensure compliance with ethical principles?	Establish accountability mechanisms by assigning responsibility for social media monitoring within the organisation.
Openness	Am I ensuring people can access and reuse all the tools and materials from the social media monitoring project?	Comply with the Open Definition principles and provide an open licence for other people to access and reuse all the resources.
Beware of Vulnerabilities	Am I recognising the potential harm social media monitoring can cause vulnerable individuals?	Only collect, share, or analyse data related to vulnerable individuals' experiences when it serves a legitimate and beneficial purpose. Pseudonymize or anonymize the data as much as possible to prevent harm
Periodic Review	Am I regularly reviewing monitoring practices to align with user behaviour and platform policy evolution over time?	Regularly adapt monitoring practices to align with ethical standards that relate to the current reality.

References and further resources

Mehtab Khan and Alex Hanna. The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability. September 13, 2022.
Adriana Alvarado Garcia, Marisol Wong-Villacres, Milagros Miceli, Benjamín Hernández, Christopher A Le Dantec. Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and Organization. CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. April 2023 Article No.: 866 Pages 1–19 https://doi.org/10.1145/3544548.3580916 .
Connie Moon Sehat and Tarunima Prabhakar, with Aleksei Kaminski. Ethical Approaches to Closed Messaging Research. Considerations in Democratic Contexts. March 15, 2021.
Sara Mannheimer and Elizabeth A. Hull. Sharing selves: Developing an ethical framework for curating social media data. 12th International Digital Curation Conference (IDCC), Edinburgh, Scotland, 20-23 February 2017.
Dr. Leanne Townsend and Prof. Claire Wallace. Social Media Research: A Guide to Ethics. This work was supported by the Economic and Social Research Council [grant number ES/M001628/1] and was carried out at The University of Aberdeen, 2016.
Horbach SPJM, Tijdink JK, Bouter L. 2022 Research funders should be more transparent: a plea for open applications. R. Soc.Open Sci.9: 220750. https://doi.org/10.1098/rsos.22075
Celis Bueno y Schultz (2021) Data Extractivism. In: Celis Bueno y Schultz, Imaginación maquínica, http://imaginacionmaquinica.cl/data-extractivism

About the Open Knowledge Foundation

7 principles for handling social media data ethically - Planning checklist​

Context: Data availability, personal data, access rights​

A principle-based framework​

Ex-ante considerations of handling social media data ethically​

Human rights considerations​

Social and collective harm​

First proposal: Develop social media monitoring Key Risk Indicators (KRI)​

Second proposal: Adopt a principles-based approach for social media data handling​

References and further resources​

7 principles for handling social media data ethically - Planning checklist

Context: Data availability, personal data, access rights

A principle-based framework

Ex-ante considerations of handling social media data ethically

Human rights considerations

Social and collective harm

First proposal: Develop social media monitoring Key Risk Indicators (KRI)

Second proposal: Adopt a principles-based approach for social media data handling

References and further resources