Intercoder Reliability Worked Examples/Formulae

EDIT 06/30/10: This document described below has been superseded by a new version available in the recently published journal article “ReCal: Intercoder Reliability Calculation as a Web Service.” No changes have been made to the substance of the equations themselves, but the examples have been clarified based on referees’ suggestions.

For those of you interested in calculating intercoder reliability by hand, I have created a PDF document that contains formulae and worked examples for all of the coefficients ReCal offers. They are: percent agreement, Scott’s pi, Cohen’s kappa, Krippendorff’s alpha, and Fleiss’ kappa. If you find any errors or simply don’t understand something, please don’t hesitate to let me know either in comments or in an email.

New ReCal feature: ‘Export Results to CSV’

I have just introduced a new feature to ReCal called “Export Results to CSV.” The purpose of this feature is to offer a new format for the program’s output—until today, the only output option offered was HTML. Users now have the option of saving their results to a formatted CSV file suitable for viewing in a spreadsheet application such as Excel.

Using this new function is easy. Simply submit your data file(s) to ReCal as usual, and near the bottom of the usual HTML results page you’ll see a button labeled “Export Results to CSV.” Click this button, and your output CSV file should pop up as a download. Give this file an appropriate name (this is important, as the default filename is “output.csv”) and save it to your hard drive. You should now be able to open your results file in any spreadsheet application. (Though it is technically possible, I highly recommend not trying to open these output files in statistical applications such as SPSS or STATA, as they most likely will not display the data properly.) When you export after having accumulated the results of multiple data files via the Save Results History option, the resulting output file will contain all of that data.

If you experience any problems or have any questions about exporting your results, please leave a comment below.

Protected: User Activity Charts 7/05

This content is password protected. To view it please enter your password below:

New ReCal Feature: ‘Save Results History’

This post describes a new feature I just added to ReCal called “Save results history.” The purpose of this feature is to organize ReCal’s output in the style of SPSS, Stata, R, and other general-purpose statistical applications.

After you run your initial data file through ReCal, you’ll notice another ReCal file form at the bottom of the results page. Below this form is a checkbox captioned “Save results history.” When checked, on the next execution ReCal transfers the currently displayed data output to the new results page and appends the new results to the bottom. This emulates the functionality of the output windows found in SPSS/Stata etc. by creating a history of results in order from oldest to most recent on a single page. Previously, ReCal created a separate results page for each file you wanted to analyze. Now you can collect all your related results in one output page for easy reference. And of course you can always leave the box unchecked to save only one set of results per page.

Click here to see a sample of ReCal output with the “Save results history” option enabled. This sample contains results from two formatted data files.

If you encounter any difficulties or incorrect results using this feature, please let me know ASAP.

Yahoo to shut down Geocities; get PRAM while you can

UPDATE: Geocities is dead as of 10/27/09, and to my knowledge PRAM cannot be downloaded from anywhere else. PRAM is copyrighted software and no terms of distribution are offered in its readme file, which is why I do not make it available from this site.

Forbes reports today that Yahoo will be shutting down Geocities at some point this year. Geocities is one of the oldest free web hosting services and a mainstay of the pre-Web 2.0 online publishing world along with Tripod and AOL. I note this primarily because the demise of Geocities means that PRAM, ReCal’s most robust competitor in the reliability calculation business, will no longer be available for web download. PRAM is (as of a Google search conducted just minutes ago) hosted exclusively on Geocities, so get it while you can. If you’re reading this after PRAM’s Geocities site has gone offline and would like to use it, let me know and I’ll email you a copy. (see update above)

While I’m on the subject, PRAM was last updated in 2004. Can anyone confirm whether or not it works in Windows Vista? If so, I’d appreciate it if you could let me know via email or a comment below.

ReCal FAQ and Troubleshooting Page

Have questions about ReCal, the online intercoder reliability calculator? Hopefully this page contains the answer you’re looking for. If not, feel free to submit new questions in comments. Back to ReCal main page

ReCal isn’t working with my data. It keeps giving me the following error code:

What applications is ReCal compatible with?

How can I be sure ReCal’s results are accurate?

ReCal reports high percentage agreements but low Scott’s pi/ Cohen’s kappa/ Krippendorff’s alpha for my data. What is going on?

I would prefer to use an Excel spreadsheet to calculate intercoder reliability. Why should I use ReCal?

I need help understanding/interpreting/improving my results; what resources are available to me?

How do I create CSV files?

What are CSV files and why does ReCal require them?

Why doesn’t ReCal work with Excel/SPSS/Word/[insert other proprietary software package here] files?

My data cannot easily be reformatted to conform to ReCal’s specifications. What alternatives are available?

What happens to my data when I submit it to ReCal?

Is there a version of ReCal forthcoming that will perform similar analyses on ordinal and interval data and that will accept missing data?

What are the functional differences between ReCal2 and ReCal3?

Who are you?


 

Error 1—You should never see this error, at all. Ever. If you do, please let me know ASAP because it indicates a dire system error.

Error 2—This error occurs in two cases: first, when your file runs above 100,000 bytes in size; and second, when your file is 0 bytes in size. It helps prevent incorrect and corrupt file types from being processed (CSV files are rarely that large and never that small). Double-check your file and make sure it is indeed a non-corrupt CSV file.

Error 3—This error occurs when your data file contains characters other than numeric digits (with the exception of alphabetic letters on the first row). ReCal’s requirements in this regard are generally quite strict—the digit “1” by itself would pass muster, whereas “1.00” would not due to the decimal point. Similarly, negative numbers won’t work in ReCal due to the minus sign—you’ll need to convert them to positive numbers. The only exception is that header text for each column may be included on the first row (a la SPSS); in this case, the entire first row will be ignored and calculation will begin on the second row. Make sure to scour your entire file for any characters other than numeric digits except on the first row.

Some users might see this error even when they are absolutely certain that their file contains only numbers. The problem in these cases may be that the “CSV” file is delimited by a character other than commas or semicolons. (See What are CSV files and why does ReCal use them? if you don’t know what this means.) To determine whether this is the problem, open your file in a basic text editor (not MS Word) such as Wordpad in Windows or TextEdit in Mac. If you see a series of numbers separated by anything other than commas or semicolons, you will need to run a Find/Replace command to convert whatever the separating character is into commas.

Error 4—This error is caused by missing data in your file, commonly seen in spreadsheet software as blank cells surrounded by data. Missing data violates the assumptions of all the coefficients ReCal computes; therefore it is not accepted. Fill in the missing data on the line indicated or delete the line entirely and try again.

Error 5—This error occurs for ReCal2 only and indicates an odd number of columns, whereas ReCal2 requires an even number of columns. Recall that ReCal2 assumes that each pair of columns constitutes two coder’s judgments on a single variable. If the number of columns in the data file is odd, the final column has no corresponding column with which reliability can be calculated. Double-check the number of columns in your file and try again.

Error 7—ReCal requires that each file submitted to it feature a “.csv” extension at the end of the filename. It is critical to understand that a file cannot be converted to CSV format simply by changing its extension! See How do I create CSV files? for more details on this point.

Error 8—This error occurs when all the rows in your file do not contain the same number of codes. For example, if rows 1-10 of a hypothetical 20-row file contain three columns of data, and rows 11-20 contain four, the file will trigger error 8. ReCal would return incorrect results if it attempted to analyze it. There are two ways to solve this problem: the first would be to delete the 4th column from rows 11-20, leaving 20 rows with 3 columns each; this solution only works for ReCal3. The second is to add the missing data to rows 1-10, creating 20 rows of 4 columns each which could be analyzed in ReCal2 or ReCal3 depending on the nature of the data.

What applications is ReCal compatible with? ReCal can read data from any software application that has the ability to “Save As” or “Export” files in CSV format. (More on CSV and why ReCal uses it here.) This includes SPSS, Stata, S-PLUS, SAS, Excel, Google Docs, MS Access, OpenOffice/NeoOffice, Minitab (as of version 15), and more.

How can I be sure ReCal’s results are accurate? Unfortunately, there is no way I can show a priori that ReCal will furnish accurate results for all possible datasets. 100% certainty is only possible in certain branches of pure mathematics; in the real world all sorts of things can go wrong. With ReCal, for example, very rarely people will format their data incorrectly yet manage to see results anyway, which of course will be incorrect. Barring this scenario, I would certainly encourage you to test ReCal against other reliability calculators, especially if its results appear flawed. A list of alternative reliability calculators can be found here.

I would prefer to use an Excel spreadsheet to calculate intercoder reliability. Why should I use ReCal? Here are several reasons why ReCal is superior to spreadsheets for reliability calculation:

  1. From a programming standpoint, Excel’s basic function language is not very sophisticated in its handling of arrays, which are essential for calculating reliability. This means that VBA (Visual Basic for Applications, a Microsoft proprietary programming language that works inside Excel spreadsheets) would need to be used. Unfortunately, MS has removed VBA from the current Mac version of Office, meaning that an ICRC macro written in VBA would be useless for non-Windows users.
  2. Even if VBA hadn’t been removed from Office for Mac, a VBA macro would still restrict ReCal usage to Excel users. ReCal doesn’t require anything other than a web browser and an internet connection.
  3. PHP generally runs faster than VBA, which you’ll notice if you compare ReCal to PRAM (although technically PRAM is written in VB rather than VBA, the languages are very closely related as their names indicate).
  4. I don’t know of any publicly available spreadsheets that calculate Scott’s Pi, Cohen’s Kappa, or Krippendorff’s Alpha. (In fact, this was the main reason I created ReCal in the first place.) If you know of any, please let me know and I’ll link to them.

I need help understanding/interpreting/improving my results; what resources are available to me? Probably the best intercoder reliability resource on the web is Matthew Lombard’s site, which presents the basics of how to calculate, use, and interpret reliability statistics. Beyond that, you may be interested in the extended discussions found in Content Analysis: an Introduction to Its Methodology by Klaus Krippendorff and/or The Content Analysis Guidebook by Kim Neuendorf. Finally, if you have a question of general interest that isn’t already answered on either this site or Lombard’s, you can ask me and I’ll answer it publicly if I can.

How do I create CSV files? The specific instructions on how to do this differ depending on which application you are using, but in Excel and SPSS I believe you use either the “Export” or “Save As” command and select “CSV” or “Comma-Separated Values” as your file format. It is important to remember that merely changing a file’s extension manually to “.csv” does not convert the file format itself to CSV; you must use your application’s Export or Save As function.

What are CSV files and why does ReCal require them? CSV stands for “Comma-Separated Values” and is a non-proprietary method of representing tabular (spreadsheet) data that can be read and exported by a wide range of applications (wikipedia entry here). ReCal requires CSV files because doing so maximizes compatibility across software applications and operating systems.

Why doesn’t ReCal work with Excel/SPSS/Word/[insert other proprietary software package here] files? See the answer to the question above.

My data cannot easily be reformatted to conform to ReCal’s specifications. What alternatives are available? Click here to view a list of alternative reliability calculators.

What happens to my data when I submit it to ReCal? Your data file is uploaded to a private folder on my web hosting account for troubleshooting purposes. In lieu of actual user feedback, reviewing user data directly is the only way I can identify and fix bugs. See the fine print for more info.

Is there a version of ReCal forthcoming that will perform similar analyses on ordinal and interval data and that will accept missing data? ReCal can now accept ordinal, interval, and ratio data via ReCal OIR. However it is still unable to accept datasets with missing data. If your data is incomplete, one strategy is to perform casewise deletion—that is, to delete all cases which were not evaluated by all coders. Of course, you would have to do this manually before submitting your file to ReCal. Casewise deletion is probably best used when the number of incomplete cases is small, but then, content analysis data sets with large amounts of missing reliability data are problematic from a broader validity standpoint.

Generally, if you’re interested in certain new features, please let me know via comment or email—knowing that users actively want a particular function makes me more likely to develop it.

What are the functional differences between ReCal2 and ReCal3? There are two main differences between the two:

  • ReCal for 2 Coders (ReCal2) can calculate reliabilities for multiple variables at once, whereas ReCal for 3+ Coders (ReCal3) can only calculate reliability for one variable at a time. If you have several variables all coded by two coders, the former edition might save you some time.
  • Although the two utilities share a formally identical data format, they make very different assumptions about what that data represents. ReCal2 assumes that data columns come in pairs, i.e. that columns 1 and 2 represent two coders’ codes for a single variable, cols 3 and 4 represent two coders’ codes for a different variable, etc. By contrast, ReCal3 assumes that each column in the input file represents a different coder’s work on a single variable. Therefore, the same 6-column CSV file would represent 3 different variables coded by 2 coders each to ReCal2, while ReCal3 would interpret it as one variable coded by 6 different coders. For this reason, the only files for which both ReCal2 and ReCal3 will give accurate results are those containing only 2 columns/coders. Submitting data intended for one edition to the other will generate incorrect results!

Protected: AU Charts 2/6

This content is password protected. To view it please enter your password below:

International ReCal Users Take Note:

I recently fixed a bug that had been preventing certain users with non-Latin character sets from executing ReCal. So if your computer uses a non-European language and ReCal has failed you over the past month or so, it might work for you now. As always, please don’t hesitate to let me know about any technical problems you may encounter.

ReCal: The Fine Print

As of Tuesday, November 11, 2008, I have begun collecting all data files uploaded through ReCal in order to improve the application. (Prior to today, all data was discarded as soon as ReCal ran its calculations.) By using ReCal you agree to license your data files to me for this purpose and for no other. Since your data files consist entirely of numbers meaningful only to you, it is completely anonymous—this is one of the reasons ReCal does not require text data headers. Google, Yahoo, Microsoft, and other providers of online services use your personal data in similar ways—but I don’t sell your files (not that there’s a market for CSV files full of unlabeled numbers anyway). If you have any questions about how I use ReCal user data, please leave them in comments below.

Protected: AU Charts set 3 (updated 11/13)

This content is password protected. To view it please enter your password below: