What types of data can be collected on social media?
with contributions from Emma Morlock, Antje Relitz, and ZoΓ© Wolter
Introduction: A structured overview of the four different types of social media dataβ
In today's digital world, social media platforms are overflowing with data, making them a goldmine for researchers in many fields. As of 2023, more than 4.9 billion people around the globe are using social media, which is over 60% of the worldβs population [Source].
A comprehensive understanding of these platforms and the types of data they generate can significantly enhance the ability to analyse user behaviour, societal trends and communication patterns. This chapter aims to provide a structured overview of four different types of data available from social media:
- Content Data: This includes all kinds of media created and shared by users, like text, images, videos, and audio. It gives us insights into how people express themselves and communicate.
- Interaction Data: This involves metrics that capture how users engage with content, including reactions, shares, and overall reach, revealing patterns of engagement and influence.
- Metadata: This consists of contextual information about the content, such as timestamps, geolocation data, and technical specifications, which help us better understand the context and usage of the content.
- User Data: This category covers information about the users themselves, including demographic details, profile information, and account characteristics, which help to analyse user behaviour and segmentation.
By organising these data types, we hope to highlight potential research applications and encourage researchers to make the most of social media data in their work. If you want to learn more about potential sensibilities, legal and ethical considerations for various types of data, check the corresponding chapters in the section on How to get started. If you want to learn more on how to access the available data then feel free to check out the corresponding section How to access data on platforms.
Content Dataβ
Content data is the main type of information generated on social media platforms. It includes the actual material that users create and share, and can be broadly grouped into three main types: textual content, visual content, and audio content. Each type provides unique insights into user preferences, cultural trends, and engagement patterns.
Textual Contentβ
Textual content covers all written communication, including status updates, tweets, captions, comments, and hashtags. This type of data is essential for understanding user opinions, sentiment, and trending topics. Natural language processing (NLP) techniques are often used to derive meaning from large amounts of textual data, allowing for sentiment analysis, topic modelling, and tracking language trends. Hashtags and keywords also play an important role in monitoring how specific themes or movements gain traction across networks.
Visual Contentβ
Visual content includes images, graphics, memes, and videos shared on platforms like Instagram, TikTok, or Facebook. It serves as a rich source of insight into cultural trends, identity representation, and the visual language of social media. Visuals can express emotions and social statements in ways that text alone cannot. To analyse visual content effectively, specialised tools like image recognition software and machine learning algorithms are utilised to identify patterns in imagery, colour usage, and facial expressions.
Audio Contentβ
Audio data, such as voice messages, podcasts, or music clips, are becoming more prominent with the rise of platforms like Clubhouse and TikTok. Analysing audio allows us to explore speech patterns, voice tones, and the popularity of music and sound trends across networks. The growth of voice-driven platforms opens up new avenues to examine conversational dynamics, cultural expressions through sound, and the effects of audio content on social engagement.
Interaction Dataβ
Interaction data captures the various ways users engage with content and each other on social media platforms. Examining this data allows for insights into user behaviour, how information spreads, and community dynamics. Interaction data can be divided into three main categories: Content Engagement, User-to-User Engagement, and Content Redistribution.
Content Engagementβ
Content Engagement looks at how users interact with posts, such as through views, comments, and reactions. Views indicate passive consumption and provide metrics for content visibility and reach. By looking at view counts, itβs possible to understand audience size and exposure to content. Comments let users share feedback or join discussions, providing qualitative data that reveals public sentiment and the nature of conversations around certain topics. Reactions - in the form of likes or emojis - offer quick emotional responses to content and serve as indicators of popularity and immediate impact. By examining reaction patterns, insights into user sentiment and emotional engagement can be gathered.
User-to-User Engagementβ
User-to-User Engagement emphasises the interactions between individuals, reflecting the relationships formed on social media. Following shows sustained interest and connection, with follower counts revealing potential influence and network size. Analysing follower dynamics provides insight into audience-building strategies and influencer networks. Mentions, or tagging other users, foster discussions and boost content visibility, and can be analysed to understand social interactions and user connectivity. Direct Messages (DMs) facilitate private conversations, which are crucial for understanding personal relationships, although they pose privacy challenges for research. Group participation allows users to connect in communities based on shared interests, shedding light on collective behaviour and group dynamics.
Content Redistributionβ
Content Redistribution looks at how users help spread content across networks. Shares enable users to pass on posts to their followers, significantly increasing visibility. Analysing share patterns can reveal the viral potential of content and how information diffuses across networks. Retweets and reposts are specific forms of sharing that extend the reach of content without changing it. In TikTok reaction videos, creatures react to an existing TikTok video that is shown in parallel. This provides a new opportunity for instant reaction and sharing. Studying the frequency of these actions provides insights into how content spreads organically and its overall impact.
Metadataβ
While content data focuses on the actual substance of social media posts, metadata offers contextual information about that content, including when, where, and how it was created or interacted with.
Temporal Dataβ
Temporal metadata captures time-related information, such as when posts are created, liked, or shared. This data is useful for analysing trends over time, like peak activity hours, content lifespan, and the timing of viral trends.
Spatial Dataβ
Spatial metadata involves location information linked to posts or user interactions, obtained through geotags or inferred from user activity. For researchers, spatial data is vital for exploring geographic trends, regional sentiments, and localised behaviour patterns.
Technical Dataβ
Technical metadata provides insights into the devices, operating systems, and platforms used to access social media. This information helps understand access patterns, such as whether mobile or desktop devices dominate usage, and how different browsers might influence user behaviour. It also helps in identifying bots or automated accounts based on patterns of access and engagement.
User Dataβ
User data encompasses the information that social media platforms collect about individual users. This data helps create personalised user experiences and allows for analysing demographic trends, user behaviours, and network structures. Unlike content or interaction data, user data is specifically focused on the characteristics of individuals who are engaging with the platforms.
Demographic Dataβ
Demographic data consists of basic information like age, gender, language, and ethnicity. Social media platforms often gather this data during account creation or through user interactions. For researchers, demographic data is crucial for understanding how different groups engage with content, their preferences, and how specific demographics can influence trends or discussions.
Profile Informationβ
Profile information covers the details users opt to share publicly, including bios, interests, locations, and affiliations. Platforms like LinkedIn provide detailed professional profiles, while others like X/Twitter offer short bios. This data helps analyse how users present their identities, their social connections, and the types of networks formed based on shared interests or occupations.
Account Detailsβ
Account details refer to the technical and behavioural attributes tied to user accounts, such as account creation date, activity frequency, and follower count. This data helps differentiate between new and established users and active and passive users, identify influencers or high-impact accounts, and track the evolution of user behaviour over time.
Conclusion: These categories create a robust framework that facilitates the exploration of online human behaviourβ
In summary, social media data encompasses a wide range of categories that provide valuable insights into user behaviour, interactions, and content dynamics.
Content data, including textual, visual, and audio elements, forms the basis for understanding user-generated contributions and trends. Each content type offers unique opportunities for analysis, helping to explore themes, sentiments, and user engagement.
Interaction data reflects the various ways users engage with content and with one another on social media platforms. By examining this data, valuable insights can be gained into patterns of engagement, the effectiveness of content, and the dynamics of user relationships within digital communities.
Metadata adds further context to the analysis with temporal, spatial, and technical information. This additional layer is essential for uncovering patterns over time, understanding geographic influences, and addressing the technical aspects of content creation and sharing.
User data, which includes demographic data, profile information, and account details, offers insights into the characteristics and preferences of users, allowing for a more nuanced understanding of audience engagement.
Together, these categories create a robust framework that facilitates the exploration of behaviour patterns, sentiment, and influence across social media platforms. As new analytical techniques and technologies continue to emerge, there is great potential for valuable insights into social behaviour and communication in the digital age.