Friday, May 27, 2011

Calculation

Let's start with some basic mathematics:

The simplest way to assessing inter-observer agreement is to see how many exact agreements were observed. (For example, if we consider the two short case stations of our 130 students this year, we have 260 pairs of observations; amongst them 84 had identical scoring.)

The first problem is a certain degree of agreement could exist by chance. The real question is, therefore, how much better is the observed agreement better than that happens just by coincidence.

I shall leave the details of computation aside. The name for this measurement is, yes, as you know, kappa. The value could range from zero (entirely no agreement except that observed by chance) to one (absolute agreement).

But, the major weakness of kappa statistic is that it takes no account of the magnitude of disagreement (all are treated equally). The theory is simple: For example, if the scores of two examiners differ by one, it is perfectly OK; a discrepancy of three or more, on the other hand, does not seem satisfactory.

For simplicity, I shall assume the followings:
  • 100% agreement is considered is the scores by two observers are the same, or differ by one point
  • 50% agreement if the scores differ by two points
  • 0% agreement if they differ by three points or more
With this "weighted" kappa statistics, how well do you think our examiners agree with each other? (In other words, what do you think is the kappa value?)

Let me tell you tomorrow.

No comments: