**UPDATE 5/22/17:**By popular demand, ReCal OIR now allows missing data! Click the link for details.

ReCal (“Reliability Calculator”) is an online utility that computes intercoder/interrater reliability coefficients for nominal, ordinal, interval, or ratio-level data. It is compatible with Excel, SPSS, STATA, OpenOffice, Google Docs, and any other database, spreadsheet, or statistical application that can export comma-separated (CSV), tab-separated (TSV), or semicolon-delimited data files.

ReCal consists of three independent modules each specialized for different types of data. The following table will help you select the module that best fits your data. (If you do not know whether your data are considered nominal, ordinal, interval, or ratio, please consult this Wikipedia article to find out more about these levels of measurement.)

Level of measurement |
N of coders |
Missing data allowed? |
Use |

Nominal | 2 coders only | No | ReCal2 (includes percent agreement, Scott’s pi, Cohen’s kappa, and nominal Krippendorff’s alpha) |

Nominal | 2 or more coders | No | ReCal3 (includes pairwise percent agreement, Fleiss’ kappa, pairwise Cohen’s kappa, and nominal Krippendorff’s alpha) |

Nominal, ordinal, interval, or ratio | Any N of coders | Yes | ReCal OIR (includes nominal, ordinal, interval, and ratio Krippendorff’s alpha with support for missing data) |

Please visit the ReCal FAQ/troubleshooting page if you have questions or are experiencing difficulty getting ReCal to work with your data. If you still have questions please contact me directly rather than leaving a comment.

**Want to support ReCal? The best way is with a citation to one or both of the following articles in your final manuscript**.

ReCal’s source code (which is open-source) was last updated on **05/22/2017**. To date, ReCal (2, 3, and OIR combined) has been successfully executed a total of times by persons other than the developer.^{1}

^{1}This counter was reset to zero sometime in late 2014 under unknown circumstances. On 2/18/15 I manually reset it to the combined cumulative Google Analytics hit count for ReCal2, ReCal3, and ReCal OIR.

Dear Mr. Freelon,

would it be possible, you send us your opinion on our

problem?

Since we calculate intercoder-reliability for different sub-studies of our project with your programme we easily get the reliabilty-results, including the amount of coder-differences.

We learnt that our coding-system ameliorated over the

time, and we started to use your programme to help us

to sharpen up our category-system by adding examples and by reformulating the rules. Hildegard did a complete analysis of the mistakes (disaggrements) found by ReCal2 and up to now 5 mistakes are remaining. In her thesis she wants to

present the first calculation with about 60 disagreement, than a table with all commented disaggrements and then she executes a new reliability analysis and of course nearly

all categories show an aggrement of 100%.

Can we do it like this? Or do you propose another

way? We find it very necessary to sharpen up our system through the process we explained above.

An expert of methods like you, has he any arguments against this procedure?

Thanking you in anticipation for you soon reply

we remain with best wishes

Gaby and Hildegard

Has anyone had any issues from journal editors and/or reviewers when using this service to calculate Cohen’s kappa?

Do you have a preference for how you want your work cited?

For file names like AB_test.csv, ReCal3 does something to the filename in its report: it becomes _test.csv. Pretty inconvenient if the part before the underscore is identifying different versions.

While I can rename my files to avoid this, it would probably be good if ReCal3 would respect any filenames it is fed (if not too outlandish).

Ok, this has been fixed.

Do you have plans for a version that calculates Krippendorff’s alpha with missing data? K’s method apparently allows for this. Gwet’s Agreestat program supposedly handles missing data, but when I downloaded the trial version of that the security routines where I work thought it was unsafe to run and refuesed to allow it.

I do in fact have plans to add support for missing data to ReCal OIR (to which I will also add KA for nominal data). In fact I’ve already done most of the work, but I still need to test the algorithm to eliminate potential bugs. The bad news is I probably won’t be able to release the update until this summer–projects that count for tenure come first!

As a check, I’ve entered the data from two of Krippendorff’s examples (the 3×15 matrix in Wikipedia and the 4×12 matrix in Krippendorff’s 2011.1.25 paper referenced on this web page). In both cases I’m getting different results from the web page and “reference” documents. I am not sure if I’m doing something wrong or if there is a problem with the algorithm on this web page.

Wikipedia alpha = 0.811, ReCal3 = 0.235

Example C alpha = 0.743, ReCal3 = 0.577

Since this site saves examples, the uploaded data files are Alpha_Wikipedia.csv and Alpha_XamplC.csv.

I would greatly appreciate guidance/ suggestions regarding why the discrepancy in alpha values.

Thanks, – Andy –

Hi. One problem you may be experiencing is the fact that ReCal does not currently accept files with missing data (and states as much in the instructions, though I plan on adding support for missing data this summer). So you can’t get accurate results for the Wikipedia example, and I’m not sure which Krippendorff 2011.1.25 paper you’re referring to–it’s not referenced on my site–but the same would be the case if data are missing from that example.

If your files contain missing data I suggest you use either Andrew Hayes’ macro for SPSS/SAS or the R package “irr,” both of which are linked from the Wikipedia page. Alternatively you could use ReCal if you first perform listwise deletion of missing data, as I suggest on the FAQ page. But check back in a few months–I’ve actually already written the code to add missing data support, but I need to test it before I roll it out.

I am working on my first piece of research so am completely new to testing. I am unsure of how to enter my data as the example says it uses 6 coders info for 1 variable. I have 10 variables/statements 40 participants and ordinal data as a response to a statement ( number between 1- 5 ). Can you explain how I should set out the data in Excel to then imput it here to run Krippendorfs alpha? Thank you so much.

Have you considered open sourcing the PHP you’re using to do the calculations? Your site is incredibly useful but I’d like to be able to automate some calculations in a way that’s a little more elegant and reliable than screen-scraping your site. If you’re open to sharing in any way, please email me to discuss.

Hi Deen,

A few replies above this you mention that you’ll be rolling out support for absent data shortly – any idea when this will be?

Thanks

I have two coders and 200 articles that they have each coded. Do I have to run reliabilty test for every pair of articles? If so, that means I will have 100 reliability coefficients – I’m lost – any help would be awesome.

Hi,

Thanks for this wonderful software. However, I have some concerns. I found high percentage agreements for some of my variables, but a somewhat low scott pi. For instance, 2 categories showed 96% agreement, with scott pi of .79 and .78 respectively. Another showed same 96% agreement and scott pi of 0.94. Yet another, 97% agreement and scott pi of 0.71. I need to know how the software calculated for scott pi, and why these differences in results. Please help ASAP. I need to know for my defense.

I have a dataset with nominal data (2 raters using 5 categories to rate 25 forms). Would I convert each ‘match’ between raters to “1” and “1” and each ‘non-match’ to “1” and “0” for the csv file? Does that make sense?

My study involves analysis of seven organisations’ annual and sustainability reports using the GRI guidelines. My constructed GRI template has 91 indicators in total, and it requires a rigorous assessment of organisations’ reports. The sample involves 3 organisations, and we have 2 independent coders to analyse these reports. At the end, we hope to see if their results match up with each other, so the real analysis can begin.

1st company – 1st coder (24 yes and 67 no) and 2nd coder (26 yes and 65 no)

2nd company – 1st coder (13 yes and 78 no) and 2nd coder (19 yes and 72 no)

3rd company – 1st coder (33 yes and 58 no) and 2nd coder ( 29 yes and 52 no)

My questions are: How should I input these results into .CSV? Can I integrate these results into the same .CSV, and calculate K’s Alpha as a whole ? or I have to calculate K’s Alpha for each company?

If we are using Krippendorf’s Alpha to calculate the IRR between two coders, is there a place to enter the range of our scale? Mine is 0-6. Does this even matter?

Can the tool calculate the confidence interval?

Thanks very much for this tool. I have a question about the ordinal and interval tests. For example: I have categories ordered 1, 2, 3 and 4; one rater assigns a case to category 2, another rater assigns the same case to category 3, and a third rater assigns the same case to category 4. Would alpha be higher for the first and second ratings (2 and 3) than for the first and third (2 and 4)?

… That is, with ordinal data (in contrast with nominal data), I assume one can talk about “closer agreement” and “less close agreement”.

