In Appendix A of the public report “Beyond the Hashtags: #Ferguson, #Blacklivesmatter, and the Online Struggle for Offline Justice,” my coauthors and I promised to release our Twitter data publicly in 2017. The time has come to make good on that promise. Unfortunately, Twitter’s Terms of Service restricts users from publishing any Twitter data except tweet IDs. However, these IDs can be programmatically “hydrated,” which recreates the original dataset minus any tweets that have been deleted or removed from public view since the dataset was generated. This blog post contains all the original tweet IDs separated by day along with sample code for hydrating them.
Our Twitter dataset contains all 40,815,975 tweets matching at least one of the following 45 keywords that were posted between June 1, 2014 and May 31, 2015 and had not been deleted or protected as of July 2015:
The following zip file contains 365 text files, each of which contains the tweet IDs of all tweets posted on one of the days covered by our dataset.
All that is required to hydrate the data is a properly formatted set of instructions for the Twitter API. To do this I recommend the Python module twarc, for which I provide sample code below. Before you can use twarc, you’ll need to install it and then create a Twitter app if you haven’t already. Both are free and more or less instantaneous.
Here is the (Python 3) code. The only change you’ll need to make is to enter the consumer key, consumer secret, etc. from your Twitter app between the single quotes on lines 4 – 7.
from twarc import Twarc import json consumer_key = '' consumer_secret = '' access_token = '' access_token_secret = '' t = Twarc(consumer_key, consumer_secret, access_token, access_token_secret) data =  for tweet in t.hydrate(open('bth_ids_2014-06-01.txt')): data.append(json.dumps(tweet)) with open('bth_data_2014-06-01.json','w') as outfile: outfile.write("\n".join(data) + '\n')
This will create a JSON file in your working directory containing the public tweets and associated metadata matching the IDs in the
bth_ids_2014-06-01.txt file. You can repeat this process for whichever dates/files you’re interested in.
Finally, if you use this data in any public writings, we ask that you please cite the original report.