EDIT 06/30/10: This document described below has been superseded by a new version available in the recently published journal article “ReCal: Intercoder Reliability Calculation as a Web Service.” No changes have been made to the substance of the equations themselves, but the examples have been clarified based on referees’ suggestions.
For those of you interested in calculating intercoder reliability by hand, I have created a PDF document that contains formulae and worked examples for all of the coefficients ReCal offers. They are: percent agreement, Scott’s pi, Cohen’s kappa, Krippendorff’s alpha, and Fleiss’ kappa. If you find any errors or simply don’t understand something, please don’t hesitate to let me know either in comments or in an email.
I’m a qualitative researcher for a large non-profit. We’re coding some video from kids’ video diaries.
Everything in ReCal seems to be working well EXCEPT when we’re in near complete agreement, and there is only one divergent rating. So for example, with three coders, we can have 10 different cases for the same variable. So out of the 30 judgments, we can have 29 1s and one 2, and the output on the Krippendorff is a fat 0.
I must admit that I am not well-versed in stats, so, this is a little out of my realm. All of the other statisticians we have on staff have never heard of Krippendorff’s alpha, so they can’t really help explain why this is the case.
Can you help?