***UPDATED 5/12/09; see the bottom of this post.***
The purpose of this post is to explain the following ReCal error:
*Scott’s pi/Cohen’s kappa/Fleiss’ kappa could not be calculated for this variable due to invariant values.
You should only see this error when two conditions apply simultaneously to your data: 1) all of your coders have attained 100% agreement and 2) they have all selected the same variable value for every unit of analysis. (If you see it under any other circumstances, please let me know, as it means the code is flawed and needs to be fixed.) For example, assume a five-unit of analysis reliability sample of a binary variable with possible values 1 and 0. If both coders decide that all five units should be rated 0 or that all five units should be rated 1, the “invariant values” scenario, or IVS (I’m sure someone’s come up with a better name for it) occurs. Scott’s pi, Cohen’s kappa, and Fleiss’ Kappa are all undefined when this happens (Fleiss’ kappa is slightly more robust in that the more coders in the reliability pool, the less likely they all are to choose the same value for every unit).
The reason for this is that when the two IVS conditions obtain, the mathematical definition of expected agreement for these coefficients is 1. Let’s take a look at the example specified in the previous paragraph:
As you can see, the IVS is in effect because all values for this variable are equal to 1. Percent agreement for this variable is obviously 100%; observed agreement is 1. The number of 1s for coders 1 and 2 is 5 for both, for a total of 10 decisions. The first, and only, joint marginal proportion for Scott’s pi is equal to (5 + 5) / 10 = 1. Expected agreement then becomes 12 = 1. The Scott’s pi equation would thus be:
(observed - expected) / (1 - expected) = (1 - 1) / (1 - 1)
But this leads to division by zero, which basic arithmetic tells us is undefined. Thus, Scott’s pi (and Cohen’s kappa, which behaves similarly) are undefined under the IVS. Fleiss’ kappa is similarly nonexistent when all coders assign the same value to all units.
Krippendorff’s alpha, on the other hand, is immune to this problem. Recall its basic form:
a = 1 - Do/De
When observed disagreement (Do) is 0, Do/De simplifies to 0, and a equals 1. This is one instance in which Krippendorff’s alpha improves upon its predecessors.
Update 5/12/09: A colleague recently pointed out to me that the above reasoning regarding the nature of Krippendorff’s alpha under invariant values is incorrect. It will in fact be undefined just like Scott’s pi and Cohen’s kappa due to the fact that De = 0 when all coders select the same value for all units, thus invalidating the entire expression. ReCal’s code has been amended to address this flaw.
Dear Deen, I’m getting this error when I’m running Recal on my coding results, but I’m not sure if the conditions are exactly as you pointed out above:
1) all of your coders have attained 100% agreement and
2) they have all selected the same variable value for every case.
In my results “undefined” is displayed when indeed the coders have attained 100% agreement but not when they selected the same variables for “every case”. It only shows undefined for the cases where the coders have 100% agreement and selected the same variable value.
To clarify, I have coding results for 2 coders which evaluated 16 cases on the presence of 54 variables (0 = absent, 1 = present). And only the cases where there is a 100% agreement the undefined error is shown. Not when all the cases have a 100% agreement.
Maybe this is exactly what you meant but your example and the corresponding text seemed to indicate that the error should only occur when there’s 100& agreement in all the cases.
Thank you for your comment. First let me assure you that ReCal did function as intended, reporting invariant values exceptions only for those variables for which the two conditions you mention obtain. Because the invariant values exception is calculated on a per-variable basis, it is not intended to refer to the situation in which all the cases in a multi-variable file are set to the same variable value.
I believe the confusion here stems from differing uses of the term “case”—you seem to be using it as a synonym for “variable” (i.e. column pair), whereas I (prior to editing for clarity) used it in two conflicting senses: yours as well as as a synonym for “unit of analysis” (i.e. corresponding value pair in the same row). ReCal’s output also uses this latter sense; the “N cases” is always the total number of rows in the file, since all variables in a multi-variable file must contain the same number of cases.
I can see how this would be confusing, so I have amended this post to refer to “units of analysis” and “variables,” removing all uses of the term “case.” I hope this clarifies my meaning.
this time Eva Wiegemann is calculating the reliabilites
of her data (study EXPI_8), since the reliabilties
show differences, we will analyse the mistakes and
produce a mistake-protocol
Thanks again for being able to use your programme.
Really appreciate the fine work on ReCal. I have one result, though, that I can’t figure out. I have a dichotomous variable with a 97.3 percent agreement but the kappa is only .737. I have other variables with the same percent agreement but much higher kappas. Any suggestions as to what might be going on would be appreciated.
I faced the same issue with my data and was wondering if I was using ReCal correctly. So I did some searching around. I’m not sure if this helps, but this conference paper explains that “Expected chance agreement varies with the number and
the relative proportions of categories used by the experts [coders].
This means that two given pairs of experts [coders] might reach the
same percent agreement on a given task, but not have the
same expected chance agreement, if they assigned verbs to
classes in different proportions.”
hai Kent Grayson
my research data basically same problem, i need the conferences paper to help me out, but link you give cannot be open.
I am having trouble comprehending what I should do with the “undefined” results. Should I try to improve the decision rules for these categories to get a defined result, or do I have 100% agreement in these categories? I would like to be able to present the table with clear data, and the term undefined implies that something needs to be fixed. Please advise.
Hi. I have also encountered the same issue. In this case when both coders have the same agreements on pilot dataset, does it mean that we should code more for the intercoder reliability to be computed? Appreciate your advice, please.
Is that ok if your reliability reissue comes as 100%, 75%, 62% & 50% but not less than 50%….??
Hi, I have encountered the same problem, but only in the case where both coders have coded “0” as absent for all cases. I’d appreciate your advice. Thanks and kind regards,