Skip to main content

Guidance to Get the Data you Want: How to create apropriate Filter Rules and Queries

Level required: No Code / Beginner
Platform: X
AN
Andreas Neumeier
SPARTA | University of the German Federal Armed Forces
JR
Jasmin Riedl
SPARTA | University of the German Federal Armed Forces
WD
Wiebke Drews
SPARTA | University of the German Federal Armed Forces
Original post on 13.09.2024 by Neumeier et al.

Using the X (Twitter) API as a data source can be extremely valuable for scientific studies. When it comes to collecting data from X (Twitter), the rules you define to create your data queries play a critical role. They affect the quality and relevance of the data you receive.

Building a query or rule

If you are using the Search Tweets endpoint (Recent Search and Full Search), the filter is called a query. And if you are using the Filtered Stream endpoint, it is called a rule.

Best Practices for Creating Queries and Rules

Here are some tips and best practices for creating effective rules for querying the X (Twitter) API:

  1. Define clear goals: Before you start querying, know what information you need. Do you want to collect tweets about a specific topic, from a specific location, or at a specific time? The clearer your goals, the more efficient your query will be.

  2. Keyword research: Take the time to research relevant keywords, phrases, and hashtags. This will help you focus your data request.

  3. Use Boolean Operators: You can use Boolean operators such as AND, OR, and NOT to make your search criteria more specific. For example, climate change AND research returns tweets that contain both terms.

  4. Consider alternate spellings: When focusing on keywords or hashtags, consider alternate spellings, abbreviations, or misspellings.

  5. Use exclusions: To filter unwanted results from your query, you can exclude certain words or phrases.

  6. Use geolocation features: If your research interest is geographically limited, use the X’s (Twitter’s) API geolocation feature to collect tweets from specific regions. (Note: Only a small number of tweets contain geo-information.)

  7. Set time limits: If relevant, you can limit your query to a specific time period.

  8. Test and tweak: Start with a broad query and narrow it down incrementally. Review the results and adjust your query as needed.

  9. Consider usage limits: The X (Twitter) API has usage limits. Make sure you understand them and plan your queries accordingly.

Building a query or rule

If you are using the Search Tweets endpoint (Recent Search and Full Search), the filter is called a query. And if you are using the Filtered Stream endpoint, it is called a rule.

Query limitations

Depending on your access level, the length of queries or the number of rules may be limited.

Operator types: standalone and conjunction-required

Operators are distinguished in two types: standalone operators and conjunction-required operators. Standalone operators need not necessarily be used in conjunction with other operators, but they can be.

The following query utilizes the #hashtag operator, which is a standalone operator.

#science

Conjunction-required operators presuppose the use of at least one standalone operator in the query. Otherwise, they would be too broad in scope and the query would generate an excessive number of Tweets. The following examples do not contain standalone operators and thus are not legitimate queries.

has:mentions
has:media OR is:verified

If we add in a standalone operator, such as the phrase “twitter data”, the query will work properly. The above example can be made valid by adding a standalone operator. For example:

"research results" has:mentions (has:media OR has:links)

Boolean operators and grouping

Multiple keywords can be combined by using boolean operators. Those are short words (e.g. AND, OR) that can be used to either expand the scope of the query or to specify the query.

AND logicQueries containing keywords combined by the boolean AND logic will yield only Tweets containing all the keywords. Spaces between keywords are implicitly interpreted as AND logic. For example, research results #ScientificBreakthrough will only match Tweets containing the words research and results as well as the hashtag #ScientificBreakthrough.
OR logicCombining keywords with the OR-operator expands the scope of the query. It will find every tweet containing at least one of the given keywords. For example, human OR resources OR #meme will retrieve all tweets that include at least one of the terms human, resources or the hashtag #meme.
NOT logic, negationTo apply negation (NOT) in logic, add a hyphen (-) before a keyword or operator. For instance, in the query science #meme -informatics, the search will identify posts with both #meme and science, excluding those that also have the term informatics. A frequently used example is -is:retweet, excluding Retweets and allowing matches for original Tweets, Quote Tweets, and replies. While all operators can be negated, standalone negated operators are not functional.
GroupingGrouping operators is possible with parentheses. For instance, (research results) OR (#meme has:images) will yield Tweets that have either the research and results terms, or images tagged with the #meme hashtag. The sequence of evaluation involves ANDs first, then ORs.

When using both AND and OR functionalities together, the order of operations is as follows:

  • Operators that are linked by AND logic are combined as the first step.
  • Afterward, operators connected through OR logic are applied.

For example:

  • dog OR cat mouse will be interpreted as dog OR (cat mouse)
  • horse cow OR sheep will be interpreted as (horse cow) OR sheep

To remove any vagueness and ensure accurate evaluation of your rule, use parentheses to group terms together when necessary.

For example:

  • (dog OR cat) mouse
  • horse (cow OR sheep)

Punctuation, diacritics, and case sensitivity

Characters that include accents or diacritics are handled just like regular characters, without being regarded as word separators. For instance, a rule with the keyword jalapeños would solely identify Tweets containing the exact term jalapeños, without considering matches like jalape, jalapen, or os.

All operators are treated case-insensitive. For example, the query science will provide the same results as science, SCIENCE, Science.

Rules containing accents or diacritics lead to distinct behaviors between the Filtered Stream and Search endpoints.

Filtered Stream

When you define a keyword or hashtag rule that includes character accents or diacritics, it will identify Tweets containing the precise word with the correct accents or diacritics. It won’t include Tweets with accurate letters but lacking accents or diacritic marks.

For example, rules with the keyword Résumé or hashtag #jalapeños will match Tweets that contain Résumé or #jalapeños because they include the accents or diacritic. These rules will not match Tweets that contain Resume or #jalapenos.

Search Tweets

Unlike Filtered Stream endpoints, Search endpoints are not insensitive to accents and diacritics. This means that queries containing them will return both terms with and without accents or diacritics. For example, the querys Résumé or hashtag #jalapeños will yield Tweets containing Résumé or #jalapeños as well as those containing Resume or #jalapenos.

For example, the queries Résumé or hashtag #jalapeños will yield Tweets containing Résumé or #jalapeños as well as those containing Resume or #jalapenos.

Quote Tweet matching behavior

Filtered Stream

Operators will apply to both the content present in the initial Tweet that was quoted and the content present within the Quote Tweet.

Search Tweets

Operators will not find matches in the content of the original Tweet that was quoted, but they will match content present in the Quote Tweet.

Iteratively building a rule

Test your rule early and often

Achieving accurate results with a rule on the first attempt is uncommon. Due to the sheer volume and variety of Tweets it is rarely evident which exact Tweets the search will return.

In the process of creating a rule, it is therefore essential to frequently test it using the stream endpoint to observe the data it retrieves. You should also consider testing it using one of the Search Tweet endpoints, provided that the operators you use are also compatible with that endpoint.

In the following we will start with this simple rule and develop it in dependence on the output it generates:

create OR creation

Use results to narrow the rule

While testing the rule, it is essential to review the returned Tweets to verify if they contain the anticipated and desired data. It is recommended to start with a broad rule that typically generates a large set of Tweets. Afterwards this rule can be refined to exclude unwanted results.

Since with the existing rule we have obtained tweets in multiple languages the following specification limits the results to Tweets in English.

(create OR creation) lang:en

The test resulted in several Tweets praising divine creation. We are going to remove them from the results by adding the negated keyword operator -divine. Furthermore, we do not want to include retweets. We can reach that goal by adding the negated -is:retweet operator.

(create OR creation) lang:en -divine -is:retweet

Adjust for inclusion where needed

If the query does not return certain Tweets that you know exist, you might need to widen your rule by eliminating operators that could potentially lead to the exclusion of the desired data.

We noticed that there are Tweets treating the same topic that are not included in our search results. That is because they use different terms similar or equal meaning. To cover those Tweets, we can add those terms to the rule:

(create OR creation OR making OR founding) lang:en -divine -is:retweet

Since X (Twitter) is a highly dynamic platform, your rules may need to be adapted to upcoming and outdated trends. We therefore recommend updating and adjusting your rules periodically.

Operators

  • Essential: Available with all access levels.
  • Elevated: Available when using a Project with Elevated, Academic Research, or Enterprise access.
  • Certain operators have an alternate name or alias that can be used.

Available for Search Tweets and Filtered Stream

OperatorTypeAvailabilityDescription
keywordStandaloneEssentialMatches a keyword within the body of a Tweet. This is a tokenized match, meaning that your keyword string will be matched against the tokenized text of the Tweet body. Tokenization splits words based on punctuation, symbols, and Unicode basic plane separator characters.

For example, a Tweet with the text “I like coca-cola” would be split into the following tokens: I, like, coca, cola. These tokens would then be compared to the keyword string used in your query. To match strings containing punctuation (for example coca-cola), symbol, or separator characters, you must wrap your keyword in double-quotes.

Example: pepsi OR cola OR "coca cola"
emojiStandaloneEssentialMatches an emoji within the body of a Tweet. Similar to a keyword, emojis are a tokenized match, meaning that your emoji will be matched against the tokenized text of the Tweet body.

Note that if an emoji has a variant, you must wrap it in double quotes to add to a query.

Example: (😃 OR 😡) 😬
"exact phrase match"StandaloneEssentialMatches the exact phrase within the body of a Tweet.

Example: ("Twitter API" OR #v2) -"recent search"
#StandaloneEssentialMatches any Tweet containing a recognized hashtag, if the hashtag is a recognized entity in a Tweet.

This operator performs an exact match, NOT a tokenized match, meaning the rule #thanku will match posts with the exact hashtag #thanku, but not those with the hashtag #thankunext.

Example: #thankunext #fanart OR @arianagrande
@StandaloneEssentialMatches any Tweet that mentions the given username, if the username is a recognized entity (including the @ character).

Example: (@twitterdev OR @twitterapi) -@twitter
$StandaloneElevatedMatches any Tweet that contains the specified ‘cashtag’ (where the leading character of the token is the ‘$’ character).

Note that the cashtag operator relies on Twitter’s ‘symbols’ entity extraction to match cashtags, rather than trying to extract the cashtag from the body itself.

Example: $twtr OR @twitterdev -$fb
from:StandaloneEssentialMatches any Tweet from a specific user.
The value can be either the username (excluding the @ character) or the user’s numeric user ID.

You can only pass a single username/ID per from: operator.

Example: from:twitterdev OR from:twitterapi -from:twitter
to:StandaloneEssentialMatches any Tweet that is in reply to a particular user.
The value can be either the username (excluding the @ character) or the user’s numeric user ID.

You can only pass a single username/ID per to: operator.

Example: to:twitterdev OR to:twitterapi -to:twitter
url:StandaloneEssentialPerforms a tokenized match on any validly-formatted URL of a Tweet.

This operator can matches on the contents of both the url or expanded_url fields. For example, a Tweet containing "You should check out Twitter Developer Labs: https://t.co/c0A36SWil4" (with the short URL redirecting to https://developer.twitter.com) will match both the following rules:

from:TwitterDev url:"https://developer.twitter.com"
(because it will match the contents of entities.urls.expanded_url)

from:TwitterDev url:"https://t.co"
(because it will match the contents of entities.urls.url)

Tokens and phrases containing punctuation or special characters should be double-quoted (for example, url:"/developer"). Similarly, to match on a specific protocol, enclose in double-quotes (for example, url:"https://developer.twitter.com").
retweets_of:StandaloneEssentialMatches Tweets that are Retweets of the specified user. The value can be either the username (excluding the @ character) or the user’s numeric user ID.

You can only pass a single username/ID per retweets_of: operator.

Example: retweets_of:twitterdev OR retweets_of:twitterapi
in_reply_to_tweet_id:StandaloneEssentialAvailable alias: in_reply_to_status_id:
Matches on replies to the specified Tweet.

Example: in_reply_to_tweet_id:1539382664746020864
retweets_of_tweet_id:StandaloneEssentialAvailable alias: retweets_of_status_id:
Matches on explicit (or native) Retweets of the specified Tweet. Note that the Tweet ID used should be the ID of an original Tweet and not a Retweet.

Example: retweets_of_tweet_id:1539382664746020864
quotes_of_tweet_id:StandaloneEssentialAvailable alias: quotes_of_status_id:
Matches on Quote Tweets of the specified Tweet. Note that the Tweet ID used should be the ID of an original Tweet and not a Quote Tweet.

Example: quotes_of_tweet_id:1539382664746020864
context:StandaloneEssentialMatches Tweets with a specific domain id/entity id pair. To learn more about this operator, please visit our page on annotations.

You can only pass a single domain/entity per context: operator.

context:domain_id.entity_id

However, you can combine multiple domain/entities using the OR operator:
(context:47.1139229372198469633 OR context:11.1088514520308342784)
Examples:
context:10.799022225751871488
(domain_id.entity_id returns Tweets matching that specific domain-entity pair)
entity:StandaloneEssentialMatches Tweets with a specific entity string value. To learn more about this operator, please visit our page on annotations.
Please note that this is only available with recent search.
You can only pass a single entity: operator.
entity:"string declaration of entity/place"

Examples: entity:"Michael Jordan" OR entity:"Barcelona"
conversation_id:StandaloneEssentialMatches Tweets that share a common conversation ID. A conversation ID is set to the Tweet ID of a Tweet that started a conversation. As Replies to a Tweet are posted, even Replies to Replies, the conversation_id is added to its JSON payload.

You can only pass a single conversation ID per conversation_id: operator.

Example: conversation_id:1334987486343299072 (from:twitterdev OR from:twitterapi)
list:StandaloneElevatedNEW Matches Tweets posted by users who are members of a specified list.
For example, if @twitterdev and @twitterapi were members of List 123, and you included list:123 in your query, your response will only contain Tweets that have been published by those accounts. You can find List IDs by using the List lookup endpoint.
Please note that you can only use a single list: operator per query, and you can only specify a single List per list: operator.
Example: list:123
place:StandaloneElevatedMatches Tweets tagged with the specified location or Twitter place ID. Multi-word place names (“New York City”, “Palo Alto”) should be enclosed in quotes.

You can only pass a single place per place: operator.

Note: See the GET geo/search standard v1.1 endpoint for how to obtain Twitter place IDs.

Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet.

Example: place:"new york city" OR place:seattle OR place:fd70c22040963ac7
place_country:StandaloneElevatedMatches Tweets where the country code associated with a tagged place/location matches the given ISO alpha-2 character code.

You can find a list of valid ISO codes on Wikipedia.

You can only pass a single ISO code per place_country: operator.

Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet.

Example: place_country:US OR place_country:MX OR place_country:CA
point_radius:StandaloneElevatedMatches against the place.geo.coordinates object of the Tweet when present, and in X (Twitter), against a place geo polygon, where the Place polygon is fully contained within the defined region.

point_radius:[longitude latitude radius]

Units of radius supported are miles (mi) and kilometers (km)

Radius must be less than 25mi

Longitude is in the range of ±180

Latitude is in the range of ±90

All coordinates are in decimal degrees

Rule arguments are contained within brackets, space delimited
You can only pass a single geo polygon per point_radius: operator.

Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet.

Example: point_radius:[2.355128 48.861118 16km] OR point_radius:[-41.287336 174.761070 20mi]<br />
bounding_box:StandaloneElevatedAvailable alias: geo_bounding_box:
Matches against the place.geo.coordinates object of the Tweet when present, and in X (Twitter), against a place geo polygon, where the place polygon is fully contained within the defined region.

bounding_box:[west_long south_lat east_long north_lat]

west_long south_lat represent the southwest corner of the bounding box where west_long is the longitude of that point, and south_lat is the latitude.
east_long north_lat represent the northeast corner of the bounding box, where east_long is the longitude of that point, and north_lat is the latitude.
Width and height of the bounding box must be less than 25mi
Longitude is in the range of ±180
Latitude is in the range of ±90
All coordinates are in decimal degrees.
Rule arguments are contained within brackets, space delimited.

You can only pass a single geo polygons per bounding_box: operator.

Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet.

Example: bounding_box:[-105.301758 39.964069 -105.178505 40.09455]<br />
is:retweetConjunction requiredEssentialMatches on Retweets that match the rest of the specified rule. This operator looks only for true Retweets (for example, those generated using the Retweet button). Quote Tweets will not be matched by this operator.

Example: data @twitterdev -is:retweet
is:replyConjunction requiredEssentialDeliver only explicit replies that match a rule. Can also be negated to exclude replies that match a query from delivery.

Note: This operator is also available with the filtered stream endpoint. When used with filtered stream, this operator matches on replies to an original Tweet, replies in quoted Tweets, and replies in Retweets.

Example: from:twitterdev is:reply
is:quoteConjunction requiredEssentialReturns all Quote Tweets, also known as Tweets with comments.

Example: "sentiment analysis" is:quote
is:verifiedConjunction requiredEssentialDeliver only Tweets whose authors are verified by Twitter.

Example: #nowplaying is:verified
-is:nullcastConjunction requiredElevatedRemoves Tweets created for promotion only on ads.twitter.com that have a "source":"Twitter for Advertisers (legacy)" or "source":"Twitter for Advertisers".
This operator must be negated.

For more info on Nullcasted Tweets, see our page on Tweet availability.

Example: "mobile games" -is:nullcast
has:hashtagsConjunction requiredEssentialMatches Tweets that contain at least one hashtag.

Example: from:twitterdev -has:hashtags
has:cashtagsConjunction requiredElevatedMatches Tweets that contain a cashtag symbol (with a leading ‘$’ character. For example, $tag).

Example: #stonks has:cashtags
has:linksConjunction requiredEssentialThis operator matches Tweets which contain links and media in the Tweet body.

Example: from:twitterdev announcement has:links
has:mentionsConjunction requiredEssentialMatches Tweets that mention another Twitter user.

Example: #nowplaying has:mentions
has:mediaConjunction requiredEssentialAvailable alias: has:media_link
Matches Tweets that contain a media object, such as a photo, GIF, or video, as determined by Twitter. This will not match on media created with Periscope, or Tweets with links to other media hosting sites.

Example: (kittens OR puppies) has:media
has:imagesConjunction requiredEssentialMatches Tweets that contain a recognized URL to an image.

Example: #meme has:images
has:video_linkConjunction requiredEssentialAvailable alias: has:videos
Matches Tweets that contain native Twitter videos, uploaded directly to Twitter. This will not match on videos created with Periscope, or Tweets with links to other video hosting sites.

Example: #icebucketchallenge has:video_link
has:geoConjunction requiredElevatedMatches Tweets that have Tweet-specific geolocation data provided by the Twitter user. This can be either a location in the form of a Twitter place, with the corresponding display name, geo polygon, and other fields, or in rare cases, a geo lat-long coordinate.

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

Example: recommend #paris has:geo -bakery
lang:Conjunction requiredEssentialMatches Tweets that have been classified by Twitter as being of a particular language (if, and only if, the tweet has been classified). It is important to note that each Tweet is currently only classified as being of one language, so AND’ing together multiple languages will yield no results.

You can only pass a single BCP 47 language identifier per lang: operator.

Note: if no language classification can be made the provided result is ‘und’ (for undefined).

Example: recommend #paris lang:en

The list below represents the currently supported languages and their corresponding BCP 47 language identifier:

Amharic: am German: de Malayalam: ml Slovak: sk
Arabic: ar Greek: el Maldivian: dv Slovenian: sl
Armenian: hy Gujarati: gu Marathi: mr Sorani Kurdish: ckb
Basque: eu Haitian Creole: ht Nepali: ne Spanish: es
Bengali: bn Hebrew: iw Norwegian: no Swedish: sv
Bosnian: bs Hindi: hi Oriya: or Tagalog: tl
Bulgarian: bg Latinized Hindi: hi-Latn Panjabi: pa Tamil: ta
Burmese: my Hungarian: hu Pashto: ps Telugu: te
Croatian: hr Icelandic: is Persian: fa Thai: th
Catalan: ca Indonesian: in Polish: pl Tibetan: bo
Czech: cs Italian: it Portuguese: pt Traditional Chinese: zh-TW
Danish: da Japanese: ja Romanian: ro Turkish: tr
Dutch: nl Kannada: kn Russian: ru Ukrainian: uk
English: en Khmer: km Serbian: sr Urdu: ur
Estonian: et Korean: ko Simplified Chinese: zh-CN Uyghur: ug
Finnish: fi Lao: lo Sindhi: sd Vietnamese: vi
French: fr Latvian: lv Sinhala: si Welsh: cy
Georgian: ka Lithuanian: lt

Only available for Filtered Stream

OperatorTypeAvailabilityDescription
followers_count:EssentialMatches Tweets when the author has a followers count within the given range.
If a single number is specified, any number equal to or higher will match.

Example: followers_count:500

Additionally, a range can be specified to match any number in the given range.

Example: followers_count:1000..10000
tweets_count:EssentialAvailable alias: statuses_count:
Matches Tweets when the author has posted a number of Tweets that falls within the given range.
If a single number is specified, any number equal to or higher will match.

Example: tweets_count:1000

Additionally, a range can be specified to match any number in the given range.

Example: tweets_count:1000..10000
following_count:EssentialAvailable alias: friends_count:
Matches Tweets when the author has a friends count (the number of users they follow) that falls within the given range.
If a single number is specified, any number equal to or higher will match.

Example: following_count:500

Additionally, a range can be specified to match any number in the given range.

Example: following_count:1000..10000
listed_count:EssentialAvailable alias: user_in_lists_count:
Matches Tweets when the author is included in the specified number of Lists.
If a single number is specified, any number equal to or higher will match.

Example: listed_count:10

Additionally, a range can be specified to match any number in the given range.

Example: listed_count:10..100
url_title:EssentialAvailable alias: within_url_title:
Performs a keyword/phrase match on the expanded URL HTML title metadata.

Example: url_title:snow
url_description:EssentialAvailable alias: within_url_description:
Performs a keyword/phrase match on the expanded page description metadata.

Example: url_description:weather
url_contains:EssentialMatches Tweets with URLs that literally contain the given phrase or keyword. To search for patterns with punctuation in them (i.e. google.com) enclose the search term in quotes.
NOTE: This will match against the expanded URL as well.

Example: url_contains:photos
source:EssentialMatches any Tweet generated by the given source application. The value must be either the name of the application or the application’s URL. Cannot be used alone.

Example: source:"Twitter for iPhone"

Note: As a Twitter app developer, Tweets created programmatically by your application will have the source of your application Name and Website URL set in your app settings.
in_reply_to_tweet_id:EssentialAvailable alias: in_reply_to_status_id:
Deliver only explicit Replies to the specified Tweet.

Example: in_reply_to_tweet_id:1539382664746020864
retweets_of_tweet_id:EssentialAvailable alias: retweets_of_status_id:
Deliver only explicit (or native) Retweets of the specified Tweet. Note that the status ID used should be the ID of an original Tweet and not a Retweet.

Example: retweets_of_tweet_id:1539382664746020864

Effects of operators

This table provides examples of how different operators behave when tweets are retweeted, replied to, or users are mentioned.

Tweet from user1Retweet from user1
of a tweet from user2
Retweet from user2
of a tweet from user1
Reply from user1
to a tweet from user2
Reply from user2
to a tweet from user1
Reply from user3
to a reply of user2
to a tweet of user1
Reply from user3
to a reply from user1
to a tweet of user2
Retweet of user3
to the reply of user2
to the tweet of user1
Mention of user1
in a tweet of user2
Reply from user 3
to a tweet of user2
where user1 is mentioned
Retweets of
a tweet where user1
is mentioned
Original tweet authoruser1user2user1user2user1user1user2user1user2user2user2
conversation_iduser1 tweetuser2 tweetuser2 retweetuser2 tweetuser1 tweetuser1 tweetuser2 tweetuser1 tweetuser2 tweetuser2 tweetuser3 retweet
from:user1
from:user1 -is:retweet
from:user1 -is:reply
from:user1 -is:quote
to:user1
@user1
@user1 -is:retweet
retweets_of:user1
midterms user1 is candidate

Sources