I’m a new Ph.D.

Just a brief announcement—shortly before New Year’s I earned my Ph.D. I have also updated this site’s front page to indicate that I am now an assistant professor at the School of Communication at American University in Washington, DC. Thanks for all your support.

The MENA protests on Twitter: Some empirical data

If you’ve been following the online commentary about the ongoing protests in the Middle East and North Africa (MENA), you know there’s been plenty of speculation about how digital communication technologies have aided, hindered, or failed to influence events on the ground. Opining without systematic evidence is all well and good—indeed, when done well it yields testable predictions about real-world outcomes—but at some point actual data must be brought to bear on these questions. One question I have found both interesting and testable is: to what extent are social media used by individuals in Arabic countries experiencing political unrest? An additional corollary question is, to what extent do social media serve as a conversation platform for a broader Arabic online public during times of widespread unrest?

To begin to address these questions, I focus in this post on Twitter, both because its ostensible revolutionary power has been widely discussed and because data from it is fairly easy to collect and manipulate. Country-specific hashtags such as #egypt conveniently collect relevant tweets, and until recently it was possible to create and save public hashtag archives using free tools like TwapperKeeper. Unfortunately, on March 20, 2011 Twitter changed its terms of service to disallow public sharing of tweet archives. So, shortly before the change went into effect, I exported archives of several MENA-related hashtags from TwapperKeeper for analysis. The subset of the data presented in this post totals over 5 million tweets, with each entry including the author’s username, the full text of the tweet, the date and time posted, and other metadata. They do not, however, include the user’s location field, which I had to collect separately based on lists of unique users posting to each hashtag. Combining the chronologically-ordered hashtag dataset with the location data allows me to plot in time series the number of tweets in each hashtag whose authors claimed to be in the country in question. A little additional filtering helps me capture the extent to which each hashtag was used by individuals located in other Arabic-speaking countries.

But we’ll get to that in a second. First, let’s have a look at total tweet counts over time for the TwapperKeeper archives of seven major MENA hashtags: #egypt, #libya, #sidibouzid (Tunisia), #feb14 (Bahrain), #morocco, #yemen, and #algeria. Each data line begins on the date of the archive’s earliest tweet. The total N of tweets represented in this chart is 5,888,641.


A couple of things jump out at me looking at this plot. First, Libya and Egypt clearly grabbed the lion’s share of the attention, attracting several hundreds of thousands more tweets on their respective peak days than the next most popular hashtag. Both peaks were pegged to significant events on the ground—Mubarak’s resignation in Egypt’s case, and the taking of Benghazi and a major speech by Saif al-Islam Gadhafi in Libya’s. The other hashtags register far less overall activity in comparison. One hypothesis is that tweet volume in different countries may be driven by the amount of newshole devoted (by CNN, NYT, al-Jazeera, etc.) to events in that country, but more data would be needed to verify that.

The next logical question here concerns where these authors are located. Are they primarily residents of the countries in which the events are unfolding, concerned observers from culturally and physically neighboring states, or international spectators (perhaps including diasporic populations) commenting from afar? Answering this question entailed creating an automated word filter that placed each user-provided location into one of four categories: 1) in the hashtag country; 2) in the greater Arabic region (defined as the following countries: Algeria, Bahrain, Djibouti, Egypt, Iran, Iraq, Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Palestinian Territories, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tunisia, United Arab Emirates, and Yemen)1; 3) outside of both the hashtag country and the Arabic region; and 4) no given location. The filter counted both instances of the country name and cities in each country. (If anyone is interested in specific details on what the country filters included, let me know and I’ll write up another post on it.) The total N of tweets analyzed for all six countries profiled below is 3,142,621 (#libya is not ready yet for reasons I explain later), some of which overlap due to the presence of multiple hashtags.

Between 25% and 40% of unique names in each hashtag lacked any location information. These include users who left the field blank, deleted their own accounts, or had their accounts suspended. Being essentially unclassifiable data, tweets by such users are excluded from the following charts.

First we’ll have a look at #egypt:

Here we can see a pattern that will recur throughout most of the hashtags: the major spikes are driven by individuals from outside of both the country and the broader Arabic region (who are almost certainly responding to media reports). It is only when outside attention dies down that local and regional voices even begin to achieve parity with their international peers. (Note that the conspicious gaps between 1/27 and 1/29 and 2/4 and 2/8 are due to TwapperKeeper archival overload. Some of the other hashtag archives feature similar gaps. This missing data is frustrating, but what is present is valuable nevertheless.)

Next is #sidibouzid (Tunisia):

A pattern similar to Egypt prevails here, wherein outsiders usually dominate when the total N of tweets tops about 1,000.


This archive again displays the by-now familiar pattern, with the major difference being that regional tweets often exceed local tweets. This is most likely due to Yemen’s low internet penetration (1.8%).

#algeria and #morocco are very similar, so I’ll present them next:

The final hashtag archive in this post is for #feb14 (Bahrain), which unfortunately is rather incomplete. But once again, outsiders outnumber locals and regionals at higher total Ns, while locals take over at lower Ns.

I am still working on creating a comparable chart for the #libya archive, but it is difficult to apply the country filters to such a large N of unique users. A preliminary analysis of its first five days (5/16-5/20) that I presented at the Theorizing the Web conference last month showed that as the total N of tweets increased, the proportion of tweets from Libya decreased. With a net penetration rate of only 5.5%, it would not be surprising to discover that the entire hashtag followed the established pattern.

What does it all mean?

The evidence from the hashtags analyzed here indicates that, at least in the early days of the Arab Spring, Twitter served primarily as a platform for communication by international observers about the events. There is also limited evidence of a pan-Arabic public conversation within these hashtags, but this is not their primary purpose. Both phenomena are definitely episodic and appear strongly event-driven. As in the Iranian protests of 2009, Twitter seems to fall into Aday et al.’s (2010) “external attention” category of new media roles.

Of course, this doesn’t necessarily mean that Twitter use is politically inconsequential. Attentive global citizens and diasporic populations could, for example, use it to promote action opportunities to sympathetic followers. They may also retweet content from local users liberally, thus amplifying the latter’s voices beyond what the above charts imply. For that matter, local users themselves may find these hashtags useful for sharing and verifying local news at times when they are not swamped by outsiders. Answering questions like these will require textual analysis, and it is unlikely that automated methods will suffice (except for the RT question). I’m envisioning lots of content analysis, translation from Arabic and French, and input from subject matter experts in my future…

A few caveats about this data are in order. First, they do not include all tweets posted to the hashtags for the given time periods. TwapperKeeper functions by drawing samples from the Twitter search API, so there is no way to know exactly how many tweets were posted without access to the definitive Twitter-hosted databases. Second, like any continuously-running software program, TwapperKeeper can fail, as can be seen in the chart gaps above. The reason I chose to analyze #feb14 and not #bahrain is that the TK archive for the latter contains a two-week gap that included the “official” start date of the protest, February 14. Third, it is possible that other hashtags not analyzed here served different functions. Some MENA hashtags have Arabic titles, and it seems unlikely that these would fall under the external attention banner. I have archives of some of these for Egypt and am interested in collaborating with an Arabic-speaking expert to interpret them. Fourth, other social media services such as Facebook may serve different protest-related functions, depending on the country’s level of net penetration and service diffusion.

Then there is the question of whether the authors’ stated locations are accurate. Critics of my method will probably hasten to point out that many Twitter users changed their locations to “Iran” during the 2009 protests in that country. If this phenomenon occurred to any significant extent during the Arab Spring protests, it would significantly reduce the value of the current research enterprise. However, there are several reasons I doubt this to be the case. For one, to my knowledge there was no high-profile campaign to convince Twitter users to change their locations to any given Arab Spring country. With simultaneous protests ongoing in multiple countries, such a campaign (if it existed) would either have had to target one country or spread itself among more, either way diluting its overall impact. Also, my country filters included many city names, which outsiders would be unlikely to know offhand. Finally, if large numbers of international users had changed their locations to the protest countries, the filters probably would have identified far more users as local than they did. The fact that comparatively few users self-identified as local strongly suggests that the Iran strategy was not widespread in this case.

I am interested in your questions and suggestions about my methods and interpretations, so please let me know what you think in comments. If there are other analyses you’d like to see, I might be able to pull them together.


[1] Credit for this list of countries goes to Phil Howard, who recently published a book on digital communication technologies and politics in the Islamic world.

Causality, politics, and the net

Henry Farrell recently declared himself against studying the internet, and while that headline oversells his argument a bit, compelling turns of phrase are a large part of what gets good online conversations started. His basic thesis is that we should not only not study “the internet” as a system isolated from the rest of society, but also that we should trade analyses of specific online platforms (Facebook, Twitter, Youtube, etc.) for analyses of abstract causal mechanisms—some of which may flourish upon those platforms, but which are almost certainly not limited to them—that contribute to various sociopolitical outcomes. This perspective is more or less a direct application of one of the most fundamental normative stances of mainstream political science (among other branches of social science), namely that of causality as the gold standard of social research. (This position is not universal, as the existence of antipositivism attests.) I agree with Henry that causality is wonderful if you can demonstrate it, but think we need to get a bit more specific about exactly what we’re talking about before we venture too far.

Explaining my reservations will require shedding a bit of light on three interrelated questions. The first of these is: what do we mean by “causality” in this context? Secondly, what factors can and cannot be causes of political outcomes? Finally, what are the prospects of causal analysis of ICT-augmented politics?

The term “causality” carries multiple definitions in different contexts. For the purposes of this blog post, I intend the nomothetic and probabilistic sense of the term that is used widely throughout the social sciences. “Nomothetic” simply means covering a wide variety of cases, as opposed to “idiographic” causes which only apply to a single case. Babbie (2008) lists three widely-used criteria for nomothetic causality: correlation, time precedence, and nonspuriousness. (Alternative criteria for nomothetic causality are offered by Brady [2008] and Rubin [1980].) Correlation and time precedence should be self-explanatory for anyone with a passing familiarity with social science, and nonspuriousness simply means having eliminated most major alternative explanations and potential hidden variables. Probabilism refers to causes that increase the likelihood of a given outcome rather than guaranteeing it. Probabilistic causes are neither necessary nor sufficient, but their effects are robust enough for them to serve as meaningful predictors of the social outcome(s) in question.

It is difficult to see how ICTs by themselves could serve as causes of any given political outcome in this sense. Correlation is difficult to demonstrate because technologies are often associated with wildly divergent social outcomes in different social contexts (Markus & Robey, 1988). This is the main reason why there are very few scholarly technological determinists working today. Time precedence is also hard to straighten out in societies suffused with ICTs and proficient users. The question of which came first—political action or ICT use—will increasingly yield a single, unenlightening answer: the latter, as more and more people begin using digital technologies at early ages. Nonspuriousness presents probably the strongest objection of the three, as net skeptics have marshaled various alternative explanations for ostensibly net-driven political participation (e.g. Hindman, 2008, Margolis & Resnick, 2000).

But this point doesn’t really dent Henry’s argument at all, because he doesn’t posit technology as a cause. Rather, he focuses on social processes such as peer-to-peer information sharing and social influence as potential causes of political phenomena. It seems clear that these variables could in principle function as causes, but if they’re doing all the work, what do we need the internet for? One possibility is that online access or the use of specific services are effects rather than causes: this is the position of the normalization hypothesis, which holds that preexisting political interests cause political uses of ICTs. Another is that the role of technology is simply too complex to theorize as nomothetically as we might like, as Markus and Robey’s empirical review suggests.

In any event, the nomothetic approach requires that the social processes of interest retain their predictive power across a wide array of cases. Thus it is not enough that social influence, for example, might be linked with revolutionary activities in a few countries or situations—the two would need to be correlated, properly time-sequenced, and spuriousness-tested in many if not most cases to support a strong general theory of political action. Failing this rather lofty empirical standard, we might profitably settle for devising theories of smaller subsets of cases that are conceptually linked in some way. So it might be possible to develop theories of (for example) protest activity in advanced democracies, developing countries, or Islamic countries that include distinctive sets of roles for ICTs (e.g. Howard, 2010). But notice how far we have moved from macro-level theories that posit context-independent relationships between social processes and politics, whose breadth makes them unlikely to accrue consistent empirical support. By bounding our theoretical scopes with contextual qualifiers like culture, country, time period, and level of technological development, it is possible to develop mid-level theories that strike a balance between explaining It All and simply describing reality.

In sum, all of this is to say: yes to mechanisms, but yes also to both tightly circumscribed contextual caveats and the possibility of significant roles for online platforms. In the end I think my position is compatible with a version of Henry’s; my primary concerns are about the scope of generalizability of causal claims. Big, broad, parsimonious theory is always attractive, but it may not always be possible.

Sorting through claims about the internet & revolutions, part 2


Welcome to Yet Another Blog Post About Politics, the MidEast, and The Internet, Part 2. I venture forth once more into this already oversaturated conversation for two reasons: one, I said I would, and two, this post is going to do something most others don’t, if you can believe it. Rather than making or debunking big claims about the net’s role or non-role in revolutions, I want to lay out a few evidential standards to start a conversation about how we might be able to test some of the major claims that’ve been making the rounds.

In the pre-internet era, doing this task justice would at the very least have required a treatment of scholarly-article length, or maybe even a short book. Attempting it in a blog post might therefore seem a bit foolhardy, but the blog medium offers a key advantage over the scholarly article and book, one which also compensates for my lack of comprehensive knowledge in this area: my very intelligent readers. My last post was fortunate enough to elicit some very high-profile praise, which subsequently attracted a level of global attention most academics seldom if ever enjoy. In the interests of not wasting this rare opportunity, I want to fully embrace the spirit of “publish, then filter” and nip consultation anxiety in the bud at the same time by asking everyone with relevant information on these issues to add their opinions, questions, and especially additional evidence in comments.

Those who read part 1 of this two-part series (thanks, by the way!) will be familiar with my four-category typology of claims about the net and revolutionary politics. As part 2, this blog post has two goals, the third and fourth of an analytical to-do list I introduced in part 1 (I really should have thought out the numbering better). In that post, I (1) introduced the typology and (2) teased out some of the key differences and similarities between each type; in this one, I plan to (3) articulate the kinds of evidence ideally required to substantiate each claim and (4) to determine how well the existing evidence supports each claim. This fourth goal, represented in the “evidence we have” columns, is what will need the most help—I’ll draw on everything I know, but since you almost certainly know things I don’t, you can help fill in the blanks and correct me wherever I misinterpret something.

And now, a few tables:

The net as dissident’s advantage (DA)

Criteria req’d to test
The evidence we have The evidence we need
Broad efficacy: Digital tools must substantively contribute to multiple revolutionary ends. Quotes and trace data are suggestive, but suffer from sample bias—how do we know these folks are representative? A systematic survey of participants asking about the significance of social media would eliminate this problem.
Superiority: They must be significantly superior to non-digital substitutes such that their absence would place dissidents at a noticeable disadvantage. Stats showing the degree of control the state exercises over traditional media. For example, the Mubarak regime directly controlled or licensed most print, TV, and radio outlets. Extensive state control of traditional media plus trace data from social media are fairly convincing here, because they show that participants are using the latter but not the former. The role of small-scale media like flyers and of f2f conversation could be explored in a survey that asked dissidents to compare the roles of social media to other forms of communication.
Robustness against subversion: They must be significantly more useful for revolutionary than counter-revolutionary purposes. News reports quoting participants who claim that authorities are IDing, tracking, and retaliating against them using information gleaned from social media (in addition to the above) Assuming we can prove internet surveillance is happening on a wide scale (through official documentation of protocols, insider testimony, or convergent pieces of circumstantial evidence), we would then need to show that such actions are ineffective in stifling revolutionary activity. This appears to have been the case in Tunisia and Egypt, but China may be a different story.
Context independence: All of the above criteria must hold true in most cases most of the time. News reports/anecdotes/trace data from recent revolutionary events; more rigorous empirical studies of older events Ideally, we would run all of the above analyses for every revolution since the turn of the century. The more cases that can be shown not to exhibit at least one of the above criteria, the weaker the DA claim becomes. At present, we simply don’t have enough info to render a conclusive judgment.

The net as public sphere platform (PS)

Criteria req’d to test
The evidence we have The evidence we need
Scale: Networked publics—ongoing, distributed digital communication among citizens—must engage some significant proportion of the population.

  • Reader Coturnix adds: “the numbers of networked citizens is not sufficient – one also needs to know their geographic distribution, and their connectiveness with non-networked citizens,” with which I agree.
The key here is to discover the proportions of people who use the internet to communicate with others, as opposed to solely seeking out static information or purchasing goods and services. This is why internet penetration rates don’t cut it—PS is fundamentally about social uses of digital networks. Thus, we need Pew-quality usage data for every nation on the planet. We could use this to explore the required conditions (of size, depth of engagement, required topics) under which a country’s networked publics can successfully coordinate collective action.
Correlation: There must be a strong, non-spurious relationship between the strength (i.e. size plus activity) of a country’s networked public and the size and frequency of incidents of collective action aimed at political change. None that I know of, but if you know differently, please enlighten me. Phil Howard apparently presents data of this type in his recent book The Digital Origins of Dictatorship and Democracy. (h/t to Clay Shirky) Causation is probably impossible to establish definitively here. But we might be able to condense networked public strength down to a single continuous variable for insertion into an appropriate statistical model along with other possible influences (unemployment, per capita income, literacy rates, size of middle class, etc.) to get an idea of its relevance as a factor.

The net as citizen journalism platform (CJ)

Criteria req’d to test
The evidence we have The evidence we need
Scale: Citizen journalism must actually be occurring on some consequential scale during the revolution in question. A wealth of trace data from Twitter, Facebook, Youtube, etc., but little has been thoroughly analyzed due to the recency of the events This data would need to be analyzed for evidence of citizen journalism (using qualitative and quantitative methods) as well as other revolutionary purposes.
Efficacy: Citizen journalism must have some measurable impact on the outcome of the revolution. Anecdotes from news articles, Twitter, etc. Surveys could help here—if the majority of a representative sample claimed to have learned important facts from citizen-generated news, that would constitute strong evidence. The role of CJ should also be compared with that of traditional media, and this will likely differ from nation to nation due to differences in both the strength of the native press and foreign journalists’ access to protests sites.

The net as revolutionary nicety (RN)

Criteria req’d to test
The evidence we have The evidence we need
Low relevance: It must be shown that under most circumstances, uses of online tools do not significantly affect any consequential aspect of revolutionary politics. The best way to substantiate this claim would probably be to start with a comprehensive list of revolutionary activities (this one and this one are good places to start) and then show that the internet only contributed to the more marginal activities. A key example is external attention, which is sometimes cited as the most significant function of online tools in revolutionary contexts. Due to its overly broad scope, the case for RN weakens as the cases for the other three types strengthen. Further, its validity is highly contingent upon the ongoing development and availability of online tools with major revolutionary applications.

Theorizing the net and revolutions

I would like to close my little two-parter with two related observations about what people have been saying about the role of the net in revolutionary politics. First, one of the most obvious conclusions to draw from this exercise is that the bigger the claim, the bigger the evidential burden. People who wax optimistic about the net as dissident’s advantage find themselves in the position of having to show that it does in fact help more than hurt dissidents most of the time. In light of major differences between countries in variables such as net penetration, government use of net surveillance technologies, literacy rates, and military conscience, this is a difficult order to fill, and the passage of time will only make it more so. The same is true for the revolutionary-nicety crowd, who have the equally unenviable task of demonstrating the opposite. Big claims draw lots of attention, and that’s attractive to folks who are paid based on the appeal of their ideas, but I would argue based on the available data that technology and politics as a field is not overly amenable to broad theoretical generalization.

Now consider the much smaller evidential burdens of the middle two claims. Each is tractable enough to spur an active research agenda that can lead to interesting and valuable conclusions about net-enhanced revolutions. Of course, they won’t lead to eyeball-grabbing headlines like “Is X the Next Twitter Revolution?” or “Politics, Not Technology, Drives Revolutionary Change,” but they will leave readers with a better impression of how the internet is being used in various revolutionary contexts. (I thought of both of these headlines in less than a minute; small wonder that overworked newsrooms love these frames.) Ultimately, unless and until the empirical reality changes drastically, we must concede that the relationships between digital networks and politics are far too numerous and contradictory to summarize neatly in a headline-friendly phrase.

This point dovetails nicely with the second of my two conclusions. All of the claims I discuss above, along with many of the others you’ve likely been reading, can be rephrased as “the internet does/does not X” or “the internet’s role is X” in political revolutions. In the rush to make bold statements about the relationships between digital networks and politics, the roles of non-technological factors are too often either marginalized or ignored entirely. We would do well to bear in mind that not only is tech not a monocause, its influence is also highly context-dependent. So even though the notions of the public sphere and citizen journalism platforms represent improvements over their broader alternatives, their basic forms nevertheless decontextualize technology unduly. I should emphasize that I don’t see this as a fatal flaw—it’s quite simple to shift from “does the net do X?” to “under what circumstances does the net do or facilitate X?” Basic observations about technological properties are important, but they constitute only the beginning of a proper analysis of the role of any technology in any enterprise. In isolation, such observations can convey illusions of technology as an autonomous force, which I think most of us agree it’s not. Thus, cataloging the general social tendencies of the internet does not relieve us of the difficult task of empirically investigating how technologies are used to specific ends. Like politics, technology has a stubborn way of surprising even the most seasoned of experts. Armchair postulation is certainly fun and sometimes even profitable, but there’s been more than enough of that lately: let’s get to work.

As I mentioned earlier, please let me know where I’ve missed important considerations in comments. I will edit the post and give proper credit to the commenter in context.

UPDATE 3/03/11: A few lingering questions emerged in comments—

  • Jeff asks, if I may paraphrase: How do digital technologies affect the speed of revolutions? This has actually been one of the most common claims about online politics since the late 90s: that it primarily helps the already politically invested complete their tasks faster. Few or no qualitative changes in the practice of politics are expected in this perspective. Empirically testing the speed of revolutions in high vs. low-ICT nations shouldn’t be too difficult, and as cases proliferate the answer should become clearer. Whether ICTs are enabling transformative political phenomena, on the other hand, depends on one’s personal threshold for transformativity.
  • Jay Rosen helpfully distinguishes between headlines and substantive claims, noting that the former sometimes make promises authors never fulfill. I agree, which is part of why I based my claims in part 1 on pull quotes from the articles themselves rather than on headlines. But for this very reason I see breathless, utopian-sounding headlines as a problem. Of course anyone wishing to rebut an argument owes the target the courtesy of actually reading the full text of the argument itself, but I worry about the cumulative effect of lots of headlines that frame the issue in terms of Twitter Topples Dictators, whether pro or con. What impression do you think such a focus conveys to the casual reader who doesn’t follow this topic as closely as we do? Even if they manage to avoid the impression that Twitter Topples Dictators, they may come to believe that that’s what the debate is about, when it isn’t (or at least it shouldn’t be). So the onus is on us to find ways to frame analyses of digital-tech-and-democracy issues that don’t collapse the story into such oversimplified and misleading terms. In fact, I would suggest avoiding causal frames altogether, if for no other reason than that we simply don’t have enough evidence at present to say much about techno-political causality. One suggestion is simply to talk about how people seem to be using specific digital tools in revolutionary contexts, rather than portraying said tools as magical anti-dictator weapons.

Sorting through claims about the internet and revolutions, part 1


My last blog post argued that too many commentators on the recent events in Tunisia/Egypt/Yemen/etc. have become hamstrung by the “internet revolution” frame—advocates and opponents alike tend to orient their arguments with respect to it. But beneath the headlines, it turns out there’s a sprawling assortment of overlapping and conflicting viewpoints about the internet and revolutionary politics waiting to be teased apart. This blog post will begin this sorting process by proceeding through items 1 and 2 of the following analytical to-do list (I will address items 3 and 4 in a subsequent post):

  1. Develop a rough typology of recent claims about the internet and political revolutions.
  2. Distinguish which claim types require or imply one another, which are compatible with one another, and which conflict.
  3. Identify the kinds of empirical data that would be required to substantiate each.
  4. Use this information to judge how well each claim type is supported by the available evidence.

1: A Rough Typology

I begin task 1 with a grounded typology that incorporates some of the recent online claims about the internet’s role in revolutionary politics. In keeping with the internet’s inherent epistemological inclusiveness, I do not distinguish here between academics, industry observers, journalists, and pundits—topical relevance is the only criterion for consideration. The following table introduces the typology: the left column contains the snappy names I’ll refer to each type by, and the right column contains several representative quotes for each. This typology should not be considered exhaustive, nor should it imply that the quoted sources hold their imputed views to the exclusion of other opinions not listed. My argument here is primarily that the types described below are worthy of discussion because each represents the views of more than one prominent observer.

The four claim types I’ll discuss are the net as dissident’s advantage, the net as public sphere platform, the net as citizen journalism platform, and the net as politically irrelevant revolutionary nicety (sorry, forgot to change this the first time around).

Claim type Representative statements
The net as dissident’s advantage: Holds that (1) the internet confers disproportionate advantages upon previously disenfranchised political activists; and (2) that these advantages are substantial, if not decisive, in the success of some revolutionary movements. Cory Doctorow:

[The internet] has provided a disproportionate benefit to dissidents and outsiders (who, by definition, have fewer resources to start with) than it has to the incumbent and powerful (who, by definition, have amassed enough power to squander some of it on coordination and still have enough left over to rule.

Nate Anderson:

Even yesterday, it would have been too much to say that blogger, tweeters, Facebook users, Anonymous and Wikileaks had “brought down” the Tunisian government, but with today’s news that the country’s president Zine El Abidine Ben Ali has fled the country, it becomes a more plausible claim to make.

Stephen Balkam:

While the role of social networking sites such as Twitter, Facebook and others may well have been overstated by some, it is undeniable that the use of the web to organize and sustain many of the protests has been critical.

The net as public sphere platform: Holds that the existence of weak-tie publics distinct from the state is an important precondition for revolution, and that the net’s primary revolutionary impact is in facilitating their formation. Clay Shirky:

[L]ittle political change happens without the dissemination and adoption of ideas and opinions in the public sphere. Access to information is far less important, politically, than access to conversation.

Zeynep Tufekci:

The capacities of the Internet that are most threatening to authoritarian regimes are not necessarily those pertaining to spreading of censored information but rather its ability to support the formation of a counter-public that is outside the control of the state.

David Parry:

[W]hen the government in Egypt chose to shut down the internet, they could shut down the trafficking of information along those channels, but they couldn’t shut down the public that was already created by having already communicated and interacted along those channels.

The net as citizen journalism platform: Holds that the net’s primary revolutionary influence lies in empowering citizens to report breaking news both to one another and to international audiences during political crises. Jillian York:

I believe there’s a strong case to be made for Twitter reporting, not necessarily as standalone media but as a complement to the major news networks.

Mathew Ingram:

But the reality of modern media is that Twitter and Facebook and other social-media tools can be incredibly useful for spreading the news about revolutions — because it gives everyone a voice, as founder Ev Williams has pointed out — and that can help them expand and ultimately achieve some kind of effect.

Mike Giglio:

[T]he primary function of social media has been to get around the government’s iron grip on information flows.

The net as revolutionary nicety: Holds that the internet contributes little to the activist side of revolutionary politics. Malcolm Gladwell:

People with a grievance will always find ways to communicate with each other. How they choose to do it is less interesting, in the end, than why they were driven to do it in the first place.

Evgeny Morozov:

Would this revolution have happened if there were no Facebook and Twitter? I think this is a key question to ask. If the answer is “yes,” then the contribution that the Internet has made was minor; there is no way around it.

Dan Murphy:

The fall of Soeharto, by the way, came long before the founding of WikiLeaks. Ditto for 1979’s stunning Islamic revolution in Iran.

Doyle McManus:

In the end, though, the most important steps in promoting democracy and securing human rights will continue to be low tech.

2: Necessity, Compatibility, and Conflict

The most obvious conflict in this typology is between the dissident’s advantage (DA) and revolutionary nicety (RN) stances. Both address roughly the same macro level of analysis, and both are fundamentally about collective action, yet each draws opposite conclusions about the role of the internet in political revolutions. According to DA, disempowered dissidents benefit from online tactics disproportionately compared to incumbent powers, which supports the general notion of the internet as a freedom-enhancing technology (on balance). DA adherents cite the proliferation of online trace data (tweets, Facebook posts, Youtube videos, DDOS attacks, etc.) from the front lines as evidence that the internet is supporting substantial change-making activity.  This perspective is often stereotyped as glib technological determinism, a conclusion partially supported by breathless headlines that rhapsodize about “Twitter revolutions!” and similar. But good-faith readings of the claims themselves reveal that most of them are about the degree of importance of the net as a revolutionary tool.

In contrast, the RN position marshals the cold skepticism of the jaded techno-realist to argue that the internet does not substantially enhance the resistive repertoire. One unifying aspect of this view is the stringent criterion that in order for the net to matter at all, it must be an unqualified sine qua non for the revolution in question. As might be expected, every revolution in history fails this test. (Has any single communication technology ever been a sine qua non for any revolution?) For RN advocates the fact that revolutions have occurred throughout history is proof that communication technologies don’t matter—revolutions will use whatever channels are at hand to connect, organize, and take action. The tweets, Facebook groups, and DDOS attacks of today are close analogues of the posters, samizdat, and nonviolent resistance tactics of revolutions past. Revolutionary politics is at bottom concerned with ultimate causes such as economic- and corruption-related grievances—everything else is just details.

Both of these claims are quite broad—they draw conclusions about the revolutionary role of the internet as a whole without generally making fine distinctions between different applications. This makes empirical assessment difficult, a topic I’ll discuss at greater length in my next blog post.

The remaining two positions, public sphere platform (PS) and citizen journalism platform (CJ), differ from DA and RN in specifying mechanisms through which the net contributes to revolutionary politics. Both can be seen as narrower claims that are essentially consistent with DA, though many adherents might dismiss DA as overly optimistic and vague. The central assumption of PS is that a robust public sphere—that is, a network of conversing citizens free of state sponsorship or surveillance—vastly increases a society’s chances of implementing political change through collective action. More so than the internet at large, social media have become key platforms upon which these national public spheres produce themselves. In some versions of this thesis, the conversations don’t even have to be explicitly political, but they must be ongoing and far-reaching. When a potential proximate cause emerges, such as Mohamed Bouazizi’s self-immolation in Tunisia, preexisting public spheres can be quickly mobilized for effective action.

CJ neither requires nor conflicts with PS , but rather is a largely independent claim. It holds that once the wheels of revolution begin to turn in networked societies, citizen journalists play a significant role in distributing information among both the national polity and international audiences. This role is particularly important in authoritarian states which own or otherwise control most forms of mass media. Enterprising citizens of these states can use digital media to capture the attentions of local and distant audiences with credible dispatches from the front lines. CJ’s key testable assertion is that this is the internet’s primary function for dissidents during times of revolution.

One point to note here is that RN is fundamentally inconsistent with the other three positions. More specifically, it may acknowledge that some public spheres or journalistic endeavors are partially internet-based, but it contends that this fact is inconsequential. PS and CJ have points of overlap—citizen journalism is often a crucial input into public-sphere discussion—but PS is mostly about the essential preconditions for revolution whereas CJ is more concerned with the dynamics of communication during times of political crisis. The main conceptual fault lines in this typology lie between:

  • RN and everything else (internet doesn’t matter vs. does)
  • RN & DA and PS & CJ (macro vs. meso levels of analysis)

As noted above, PS and CJ are not really separated by a fault line. However, further research is necessary to better sketch the relationship between the two and the circumstances under which they do and do not coexist.

Stay tuned for part 2 (now available!), in which I will attempt to address the empirical evidence for each claim…

ADDENDUM 2/8/11: The response to this post has exceeded my highest expectations, especially given that I posted it on a Saturday afternoon. It is now my most-read piece ever by at least an order of magnitude—if I’d known that in advance I would have polished it a bit more. In this addendum I will respond briefly to some of the comments I’ve received and point to additional pieces that will problematize my neat little scheme.

  • Clay Shirky generously responds to my framing of these claims in an extended comment, suggesting some conceptual extensions and providing key examples that will be relevant to my next post. In particular, he suggests adding an additional category—”Net as Path to Informed Citizenship”—that emphasizes net access as a critical input for revolutionary politics. My take is that this is, among other things, the consumptive flip side of CJ—the ability to access information freely vs. the ability to produce and distribute it. Unrestricted net access is a crucial assumption of citizen journalism as an enterprise: after all, what good is the latter if revolutionary audiences cannot witness it? Shirky also implicitly observes a key weakness of this typology, which is that while it functions well as a set of categories of what people have said recently about the net and revolutions, it’s somewhat less successful as a general MECE scheme of digital activism types. Sean Aday and colleagues attempt something like this in their recent article “Blogs and Bullets: Evaluating the Impact of New Media on Conflict,” though they frame their analysis too strongly in causal terms IMO.
  • Evgeny Morozov expressed in a private note that he may not fall entirely into the RN box, citing a recent column in which he notes that the net can serve as a weapon against social progress in the hands of illiberal regimes. Stan, responding in comments, makes a very similar point. This perspective, which might be called “the net as dissident’s disadvantage,” is definitely missing above—mostly because I couldn’t find many thought pieces embracing it in the context of recent events in the Arab world (although I’m sure they’re out there)—and it suggests that Morozov straddles a line between it and RN.
  • David Parry clarifies his position on the transformative influence of digital networks in a followup to the post linked above. While endorsing his position in the typology, he argues that we need to radically rethink past notions of “publics” because they are too closely bound up in a communicative culture structured by print. I agree, but I still think the basic logic of democracy requires something close to the very general definition of “publics” used above.
  • Also from comments, Sami Kallinen “would see the typology useful if it is not trying to describe the ontology of these social technologies and techniques but a sort of snapshot of the effect of them at this very moment.” The speed of technological development and widespread social diffusion of disruptive technologies make this an important point to bear in mind. As technologies are introduced, adopted, and discarded, their meanings and purposes change—and the rate at which this is happening today poses serious challenges to the task of generating useful concepts and theories about them. But nearly all social theories of technology must confront this problem, and I don’t think anyone claims to have developed a unified, final theory of political uses of digital networks. (And I really hope no one does—it’d put me out of business!)

We need a revolution in revolution-framing.

Political revolutions are complex things; this should go without saying. But many of the commentators on the recent events in Tunisia and Egypt seem to have ignored this fact in favor of social-media triumphalism, a recent variant of a more general strain of cyber-utopianism that dates back to the early days of the web. I take it as given that this notion is an almost entirely wrongheaded consequence of the need to make succinct statements (for tweet and headline purposes) about complicated social phenomena. But the prevalence of talk about Twitter/Facebook/Wikileaks/etc revolutions has exhibited an irritating secondary effect: it has prompted many charitable, intelligent, and learned individuals to react to it. That is, credible experts have spent far too much time responding to the patently ridiculous media frame that social media somehow “caused” these popular movements, rather than explaining the role of communication (and social media specifically) within revolutionary politics.

As evidence, I submit the following cavalcade of headlines:

That many have chosen to frame these stories in this way is not a novel observation, as the CJR and GovFresh pieces demonstrate. My point is that in setting the terms of the debate among journalists, specialists, and the public, this frame has served as a distraction from more interesting and relevant questions. Whether a commentator agrees with it or not, the conversation too often remains at the level of “is this a Twitter revolution or not?” rather than inquiring as to how Twitter and other social media fit into a broader ecosystem of culture, historical grievances, media tools, and political circumstances. To be sure, some of the articles above discuss these issues, but even then they must first devote significant amounts of space to dispensing with the myth of the technology-driven revolution. And by doing so, they pass up the opportunity to create new frames for how to understand 21st-century revolutions.

Egypt isn’t the last grassroots revolutionary movement we’ll see that will use social media, so I suggest we move beyond the myth to develop new frames that do justice to the many factors that contribute to citizen-driven politics. A first step here would be to consider the role of communication more generally in such movements, and move from there into exploring the technologies that facilitate it. A preliminary inquiry might include questions such as: What has to happen before thousands turn out simultaneously in the streets to confront repressive regimes? How do grassroots actions differ between nations with more- and lesser-developed communications infrastructures? Who is most likely to participate in these actions, who is left out, and how does access to technology (as well as literacy) mediate this divide? Under what circumstances are social media likely to be more and less relevant? And what happens to movements when internet access is completely cut a la Egypt?

Look, I get it—people want to be able to draw the sweeping conclusions, to develop the big, all-encompassing theories that draw lots of attention. And there probably isn’t much we can do to keep the media from misframing the next non-Twitter revolution—after all, the role of for-profit media is to attract eyeballs, not necessarily to shed light. But those of us in the light-shedding business ought to think a bit harder about crafting ledes and headlines that acknowledge the complexity of our subject matter. After all, that’s what the majority of folks who don’t click through will take away from our pieces. Best to give them something more helpful to think about than a simple yes or no response to the media meme du jour.

ReCal journal article now available

I’m pleased to announce the publication of “ReCal: Intercoder Reliability Calculation as a Web Service,” a paper describing and verifying the output of ReCal. If you need to cite ReCal, please cite this paper. Here’s the link:

ReCal: Intercoder Reliability Calculation as a Web Service

Interval/ratio reliability calculator

I recently discovered an online calculator for Lin’s Concordance, a statistic which is often used to represent reliability between sets of subjectively-coded interval and ratio data. It computes Lin’s Concordance for two coders only and appears to accept the same column-row data arrangement as ReCal (albeit not via file submission). I have not yet had occasion to verify its output, but you can try it yourself at the following link:

Online Lin’s Concordance Calculator

From the mailbag, 12/14/09

Another question came in today, and it’s one I think the ReCal user community might be interested in. Sonya from Pennsylvania writes:

Ok, I am stumped. How can I have a percent agreement of .97 and a Scott’s Pi of-.015? I have two coders coding either Yes (1) or No (0) for the presence of a variable. What am I doing wrong. I find when calculating by hand I get similar results (off by a decimal or so). When using RECAL or calculating Scotts Pi with more than two categories, I don’t get negative Scotts Pi when the percent agreement is high.

Thanks so much for sharing your program and answering my question if you have the time.

Excellent question, Sonya. As with the last question I answered, I’ll provide your raw data (with a new filename) so that others can follow along; hope you don’t mind.

Looking at the data, you’ll immediately notice an interesting characteristic: only the second coder uses the “1” code. That is, the two coders only ever agree on “0” codes and never once on a “1” code. Scott’s pi, Cohen’s kappa, and Krippendorff’s alpha punish this phenomenon severely, the rationale being that coders must show at least some covariation in their agreements to merit high coefficient values. Krippendorff himself addressed this very situation in a recent article:

In the calculation of reliability, large numbers of absences should not overwhelm the small number of occurrences that authors care enough about to report. Without a single concurrence and three mismatches [Krippendorff here is referring to a specific dataset, which just so happens to have the same number of mismatches as Sonya’s], the report of finding 2 out of 137 cases [3 out of 99 for Sonya’s data] is about as close to chance as one can get—and this is born out by the near zero values of all the chance-corrected agreement coefficients. (2004, p. 425)

Thus, when one coder only uses one of two coding categories, and the other uses both, chance-corrected reliability will always be near or well below zero (but percent agreement can still be near 100% as it is not chance-corrected). The only solutions here seem to be either better coder training or a revised coding scheme that allows coders more latitude to agree with one another on different categories.


Krippendorff, K. (2004). Reliability in content analysis: Some common misconceptions and recommendations. Human Communication Research, 30(3), 411-433.

From The Mailbag, 12/09/09

I don’t get much correspondence about ReCal, but I do try to respond to the few queries I receive. Today, Dianne from Australia asked:

Thank you so much for a great tool. But, I hope you can help me clear up a discrepancy I’ve noticed in my results for variables that have the same number of agreements/disagreements. For example, variable 1 has 26 agreements and 1 disagreement. So does variable 3. So does variable 5. Yet, the results for variable 1 are: 96.3% agreement and Scott’s pi of 0.924. The results for variable 3 are: 96.3% agreement and Scott’s pi of 0.914. The results for variable 5 are: 96.3% agreement and Scott’s pi of 0.886. Can you please tell me why the Scott’s pi is different for each variable when all the raw data for them is the same (ie same number of agreements and disagreements)? This scenario has occurred on three separate occasions when I’ve submitted my .csv files for analysis.

Great question, Dianne. The answer lies in how Scott’s pi (and Cohen’s kappa and Krippendorff’s alpha) compute reliability as compared to percent agreement. With the latter, equal numbers of agreements and disagreements will always give you the same result, as you noticed. This is not always the case with coefficients that correct for chance agreement, as do Scott’s, Cohen’s, and Krippendorff’s. Their formulae give additional “credit” to data sets with greater variation in agreeing values: in other words, the more different coding categories your coders agree upon, the higher your reliability coefficients will be (the logic being that it is harder to attain stronger levels of agreement on data that is highly distributed across many coding categories).

But I don’t expect you to just take my word for it, so I’ll actually work through your numbers and show the math here. I hope you don’t mind if I provide your raw data here so that other interested parties can follow along—I’ve changed the headers and the filename.

Recall that the formula for Scott’s pi is

(Po - Pe) / (1 - Pe)

where Po is observed agreement and Pe is expected agreement. Observed agreement is simply percent agreement as a fraction of one; thus for all three of your variables it is thus 0.963. Expected agreement is a bit more complex, but essentially what you have to do is:

  1. Note the number of possible coding categories your coders used (each of your variables used 2 categories, represented by the numbers 0 and 1)
  2. Start by adding the number of times coder A selected category 1 to the number of times coder B selected category 1 and then divide that sum by the total number of coding decisions (which in your case is 54, or 2x the number of cases). This value is known as the joint marginal proportion for variable 1, category 1. This is equal to (12 + 11) / 54 = 0.426. In this example we need to do this for all category values in all variables, so in total we will need to calculate 6 JMPs so that each variable has 2 (the number of coding categories noted above). The JMP for var 1, category 2 is (15 + 16) / 54 = 0.574. The rest are as follows: var 3 cat 1 = 0.315; var 3 cat 2 = 0.685; var 5 cat 1 = 0.204; var 5 cat 2 = 0.796.
  3. Now that we have all our JMPs, we need to square them and then sum them within variables to get our expected agreements. So for var 1, we have 0.4262 + 0.5742 = 0.511. The expected agreement values for vars 3 and 5 are 0.569 and 0.676, respectively.

Now we have all the values we need to plug in to our main Scott’s pi formula above.

  • For var 1 Scott’s pi is (0.963 – 0.511) / (1 – 0.511) = 0.924;
  • for var 3 it is (0.963 – 0.569) / (1 – 0.569) = 0.914;
  • for var 5 it is (0.963 – 0.676) / (1 – 0.676) = 0.886.

By now you’ve probably figured out what’s going on. Looking at the data, variable 5 has the most uneven category distribution; you can tell at a glance that it is mostly zeros. This pushes its expected agreement value higher—it is easier to achieve high levels of agreement when a data set mostly falls into one category, so the Scott’s pi formula raises the bar accordingly. Vars 1 and 3 are more balanced in their category distributions, so their expected agreements are lower, making their coefficients higher despite the fact that all three vars have the same number of agreements and disagreements.

I hope this answers your question, Dianne. If not, let me know!