Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It is shown that for this class of tables all special cases of symmetric kappa coincide and that the value of symmetric kappa is not affected by any partitioning of the categories.

1. Introduction

In behavioral and biomedical science researchers are often interested in measuring the intensity of a behavior or a disease. Examples are psychologists that assess how anxious a speech-anxious subject appears while giving a talk, pathologists that rate the severity of lesions from scans, or competing diagnostic devices that classify the extent of a disease in patients into categories. These phenomena are typically classified using a categorical rating system, for example, with categories (A) slight, (B) moderate, and (C) extreme. Because ratings usually entail a certain degree of subjective judgment, researchers frequently want to assess the reliability of the categorical rating system that is used. One way to do this is to assign two observers to rate independently the same set of subjects. The reliability of the rating system can then be assessed by analyzing the agreement between the observers. High agreement between the ratings can be seen as a good indication of consensus in the diagnosis and interchangeability of the ratings of the observers.

Various statistical methodologies have been developed for analyzing agreement of a categorical rating system [1, 2]. For instance, loglinear models can be used for studying the patterns of agreement and sources of disagreement [3, 4]. However, in practice researchers often want to express the agreement between the raters in a single number. In this context, standard tools for summarizing agreement between observers are coefficients Cohen’s kappa in the case of nominal categories [5–7] and weighted kappa in the case of ordinal categories [8–11]. With ordinal categories one may expect more disagreement or confusion on adjacent categories than on categories that are further apart. Weighted kappa allows the user to specify weights to describe the closeness between categories [12]. Both Cohen’s kappa and weighted kappa are corrected for agreement due to chance. The coefficients were originally proposed in the context of agreement studies, but nowadays they are used for summarizing all kinds of cross-classifications of two variables with the same categories [11, 12].

The number of categories used in various rating systems usually varies from the minimum number of two to five in many practical applications. It is sometimes desirable to combine some of the categories [7]. For example, when two categories are easily confused, combining the categories usually improves the reliability of the rating system [13]. By collapsing categories the number of categories of the rating system is reduced. If there is a lot of disagreement between two categories, we expect the kappa value to increase if we combine the categories. This is usually the case. However, Schouten [13] showed that there is a class of agreement tables for which the value of Cohen’s kappa remains constant when categories are merged. This is not what one expects from an agreement coefficient like Cohen’s kappa. The question, then, arises: do other (weighted) kappa coefficients exhibit the same property for these tables? If the answer is negative, it would make sense to replace Cohen’s kappa by a weighted kappa with more favorable properties with regard to these agreement tables.

In this paper we present several properties of kappa coefficients with symmetric weighting schemes with respect to this particular class of agreement tables. The paper is organized as follows. In the next section we introduce notation, define weighted kappa, and discuss some of its special cases, including Cohen’s kappa. The results are presented in Section 3. Section 4 contains a conclusion.

2. Kappa Coefficients

In this section we introduce notation and define the kappa coefficients. For notational convenience weighted kappa is here defined in terms of dissimilarity scaling [8]. If the weights are dissimilarities, pairs of categories that are further apart are assigned higher weights.

Suppose two fixed observers independently rate the same set of n subjects using the same set of c ≥ 2 categories that are defined in advance. For a population of subjects, let π_ij denote the proportion classified in category i by the first observer and in category j by the second observer, where 1 ≤ i, j ≤ c. The quantities

()

are the marginal probabilities. They reflect how often the observers used the categories. The cell probabilities of the square table {π_ij} are not directly observed. Let {n_ij} denote the contingency table of observed frequencies. Assuming a multinominal sampling model with the total number of subjects n fixed, the maximum likelihood estimate of π_ij is given by

[14, 15]. Since the rows and columns of {n_ij} have the same labels, the contingency table is usually called an agreement table. Table 1 presents two hypothetical agreement tables with three categories A, B, and C.

Table 1. Two hypothetical 3 × 3 agreement tables.

First observer	Second observer
First observer	A	B	C	Total	A	B	C	Total
A	22	2	0	24	16	4	0	20
B	4	10	0	14	0	2	1	3
C	4	2	6	12	4	0	2	6
Total	30	14	6	50	20	6	3	29

Let w_ij ≥ 0 for 1 ≤ i, j ≤ c be nonnegative real numbers with w_ii = 0. The weighted kappa coefficient can be defined as [8, 12]

()

The numerator of the fraction in (2) is the weighted observed disagreement, while the denominator of the fraction is the weighted chance-expected disagreement. The value of (2) is 1 when there is perfect agreement between the two observers, zero when the weighted observed disagreement is equal to the weighted chance-expected disagreement, and negative when the weighted observed disagreement is larger than the weighted chance-expected disagreement.

Under a multinominal sampling model with n fixed, the maximum likelihood estimate of (2) is

()

Estimate (3) is obtained by substituting

for the cell probabilities π_ij in (2). A large sample standard error of (3) can be found in [16].

In this paper we are interested in the following special case of (2). We may require that weighted kappa has a symmetric weighting scheme; that is, w_ij = w_ji for 1 ≤ i, j ≤ c. Since w_ii = 0 for 1 ≤ i ≤ c, this symmetric kappa is given by

()

Special cases of coefficient (4) that are used in practice are Cohen’s kappa [5, 7, 12] for nominal categories and linear kappa [10, 17] and quadratic kappa [9, 11, 18] for ordinal categories. Cohen’s kappa and quadratic kappa each have been used in thousands of applications [6, 11, 19]. The two coefficients are briefly discussed below.

The identity weights are defined as

()

An example of weighting scheme (5) is presented in the left panel of Table 2. If we use weighting scheme (5) in (2), we obtain Cohen’s unweighted kappa [5]

()

Perhaps a more familiar definition of Cohen’s kappa is

()

Formulas (6) and (7) are equivalent; definition (6) will be used in Section 3 below. Coefficient (6) has value 1 when the observers agree completely, value zero when agreement is equal to that expected under independence, and negative value when agreement is less than expected by chance.

Table 2. Two weighting schemes for four categories A, B, C, and D.

	Identity				Quadratic
	A	B	C	D	A	B	C	D
A	0	1	1	1	0	1	4	9
B	1	0	1	1	1	0	1	4
C	1	1	0	1	4	1	0	1
D	1	1	1	0	9	4	1	0

The quadratic weights are defined as w_ij = (i − j) ² for 1 ≤ i, j ≤ c. An example of the weights is presented in the right panel of Table 2. If we use the quadratic weights in (2), we obtain the quadratic kappa [9, 18]

()

Coefficient (8) is the most popular version of weighted kappa in the case that the categories of the rating system are ordinal [2, 11, 19]. The quadratic kappa can be interpreted as an intraclass correlation, which is a proportion of variance [9, 18]. However, the quadratic kappa is not always sensitive to differences in exact agreement [11], and high values of the quadratic kappa can be found even when the level of exact agreement is low [19].

3. A Class of Agreement Tables

It is sometimes desirable to combine some of the categories [7]. For example, when two categories are frequently confused, combining the categories may improve the reliability of the rating system. Suppose we combine two categories i and j, and let d ≥ 0 be a nonnegative real number. In this paper we focus on the class of agreement tables that satisfy the condition

()

Condition (9) holds, for example, if there is perfect agreement between the raters. In this case d = 0 and we have

and π_ij = 0 for i ≠ j and 1 ≤ i, j ≤ c. It turns out that there are many nonperfect agreement tables that also satisfy (9). Examples are the agreement tables in Table 1. For the two tables, the value of d is .397 and .644, respectively. The examples in Table 1 show that agreement tables that satisfy (9) are not necessarily symmetric. Furthermore, since the examples appear to be ordinary agreement tables that can be encountered in practice, it appears that the class of agreement tables satisfying (9) is not trivial.

For Cohen’s kappa in (6) Schouten [13] showed that if (9) holds, then the kappa value cannot be increased or decreased by combing categories. In this section we present various additional results for other special cases of symmetric kappa in (4). Theorem 1 shows that all special cases of symmetric kappa coincide if (9) holds.

Theorem 1. If (9) holds, then κ_s = 1 − d.

Proof. If (9) holds, we have the particular case

()

Furthermore, for two arbitrary categories i and j with i ≠ j we have

()

for certain nonnegative real numbers a_ij ≥ 0. Hence, using these a_ij and identity (10) we can write κ_s as

()

A converse version of Theorem 1 also holds. Lemma 2 is used in the proof of Theorem 3.

Lemma 2. Let a, b ≥ 0 and c, d > 0 be real numbers. One has

()

Proof. Since c and d are positive numbers, we have a/c = b/d or ad = bc. Adding ac to both sides we obtain a(c + d) = c(a + b) or a/c = (a + b)/(c + d).

Theorem 3. If all special cases of symmetric kappa are equal, then (9) holds.

Proof. Let r, r^′ ∈ {1,2, …, c} with r ≠ r^′ be arbitrary categories. Let denote the value of the special case of symmetric kappa with and all other off-diagonal weights equal to 1. Since all special cases of symmetric kappa are equal, we have in particular for some real number d ≥ 0. Using (6), the identity is equivalent to

()

Since

, it follows from application of Lemma 2 to identity (14) and the use of identity (6) that

()

Note that in the proof of Theorem 3 certain special cases of coefficient (4) are used. Condition (9) will not necessarily hold if two arbitrary special cases of symmetric kappa are equal. We have the following consequences of Theorems 1 and 3.

Corollary 4. It holds that κ_s = 1⇔π_ij = 0 for i ≠ j and 1 ≤ i, j ≤ c.

Corollary 5. It holds that

()

Theorem 6 shows that if (9) holds, then the value of coefficient (4) remains constant when we combine two categories.

Theorem 6. Let κ_s denote the value of symmetric kappa of an agreement table with c ≥ 3 categories and the value of the table that is obtained by combining categories r^′ and r^′′. If condition (9) holds, then one has .

Proof. Since (9) holds, it follows from Theorem 1 that κ_s = 1 − d for some d ≥ 0. Let r denote the category that is obtained by merging r^′ and r^′′. Let i with 1 ≤ i ≤ c and i ≠ r^′, r^′′ be an arbitrary category. We have the four relations

()

Furthermore, since (9) holds, we have the identities

()

Applying Lemma 2 to the identities in (18a) and (18b) we obtain

()

Moreover, using (17a), (17b), (17c), (17d), and (19), we have

()

It follows from identity (20) that condition (9) also holds for the collapsed (c − 1)×(c − 1) table. Application of Theorem 1 then yields that

, from which we may conclude that

Theorem 6 shows that if the value of Cohen’s kappa in (6) remains constant when categories are combined, then the value of symmetric kappa in (4) also remains constant when categories are combined. By repeatedly applying Theorem 6 we obtain the following consequence.

Corollary 7. Let κ_s denote the value of symmetric kappa of an agreement table with c ≥ 3 categories and the value of the collapsed table corresponding to any partitioning of the categories. If (9) holds, then one has .

4. Conclusion

Kappa coefficients are standard tools for summarizing agreement between two observers on a categorical rating scale. The coefficients are nowadays used for summarizing the information in all types of cross-classifications of two variables with the same categories. In the case of nominal categories Cohen’s kappa is a standard tool. In this paper we considered a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It was shown that for this class of agreement tables all special cases of symmetric kappa, that is, all kappa coefficients with a symmetric weighting scheme, coincide (Theorem 1). Furthermore, for this class of agreement tables the value of symmetric kappa remains constant when categories are merged (Theorem 6 and Corollary 7).

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author thanks an anonymous reviewer for several helpful comments and valuable suggestions on a previous version of the paper. The comments have improved the presentation of the paper. This research is part of Veni project 451-11-026 funded by the Netherlands Organisation for Scientific Research.

References

1 Jakobsson U. and Westergren A., Statistical methods for assessing agreement for ordinal data, Scandinavian Journal of Caring Sciences. (2005) 19, no. 4, 427–431, https://doi.org/10.1111/j.1471-6712.2005.00368.x, 2-s2.0-33644877944.
Google Scholar
2 Maclure M. and Willett W. C., Misinterpretation and misuse of the Kappa statistic, The American Journal of Epidemiology. (1987) 126, no. 2, 161–169, 2-s2.0-0023250550.
Google Scholar
3 Agresti A., Modelling patterns of agreement and disagreement, Statistical Methods in Medical Research. (1992) 1, no. 2, 201–218, https://doi.org/10.1177/096228029200100205, 2-s2.0-0026959051.
Google Scholar
4 Agresti A., Categorical Data Analysis, 2002, Wiley, Hoboken, NJ, USA.
10.1002/0471249688
Google Scholar
5 Cohen J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement. (1960) 20, 37–46.
Web of Science® Google Scholar
6 Hsu L. M. and Field R., Interrater agreement measures: comments on Kappa_n, Cohen′s Kappa, Scott′s π, and Aickin′s α, Understanding Statistics. (2003) 2, no. 3, 205–219, https://doi.org/10.1207/S15328031US0203_03.
Google Scholar
7 Warrens M. J., Cohen′s kappa can always be increased and decreased by combining categories, Statistical Methodology. (2010) 7, no. 6, 673–677, https://doi.org/10.1016/j.stamet.2010.05.003, MR2728420, ZBL1232.62161, 2-s2.0-77956601127.
Google Scholar
8 Cohen J., Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit, Psychological Bulletin. (1968) 70, no. 4, 213–220, https://doi.org/10.1037/h0026256, 2-s2.0-58149412516.
Google Scholar
9 Fleiss J. L. and Cohen J., The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educational and Psychological Measurement. (1973) 33, 613–619.
Google Scholar
10 Vanbelle S. and Albert A., A note on the linearly weighted kappa coefficient for ordinal scales, Statistical Methodology. (2009) 6, no. 2, 157–163, https://doi.org/10.1016/j.stamet.2008.06.001, MR2649614, ZBL1220.62172, 2-s2.0-61749098649.
Google Scholar
11 Warrens M. J., Some paradoxical results for the quadratically weighted kappa, Psychometrika. (2012) 77, no. 2, 315–323, https://doi.org/10.1007/s11336-012-9258-4, MR2909432, ZBL1284.62764, 2-s2.0-84858451115.
Google Scholar
12 Warrens M. J., Conditional inequalities between Cohen′s kappa and weighted kappas, Statistical Methodology. (2013) 10, 14–22, https://doi.org/10.1016/j.stamet.2012.05.004, MR2974806, 2-s2.0-84863309440.
Google Scholar
13 Schouten H. J. A., Nominal scale agreement among observers, Psychometrika. (1986) 51, no. 3, 453–466, https://doi.org/10.1007/BF02294066, MR903415, 2-s2.0-0001028067.
Google Scholar
14 Agresti A., Categorical Data Analysis, 1990, Wiley, New York, NY, USA, MR1044993.
Google Scholar
15 Bishop Y. M. M., Fienberg S. E., and Holland P. W., Discrete Multivariate Analysis: Theory and Practice, 1975, The MIT Press, Cambridge, Mass, USA, MR0381130.
Google Scholar
16 Fleiss J. L., Cohen J., and Everitt B. S., Large sample standard errors of kappa and weighted kappa, Psychological Bulletin. (1969) 72, no. 5, 323–327, https://doi.org/10.1037/h0028106, 2-s2.0-33645066726.
Google Scholar
17 Cicchetti D. and Allison T., A new procedure for assessing reliability of scoring EEG sleep recordings, The American Journal of EEG Technology. (1971) 11, 101–109.
Google Scholar
18 Schuster C., A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales, Educational and Psychological Measurement. (2004) 64, no. 2, 243–253, https://doi.org/10.1177/0013164403260197, MR2019827, 2-s2.0-1842431905.
Google Scholar
19 Graham P. and Jackson R., The analysis of ordinal agreement data: beyond weighted kappa, Journal of Clinical Epidemiology. (1993) 46, no. 9, 1055–1062, https://doi.org/10.1016/0895-4356(93)90173-X, 2-s2.0-0027305418.
10.1016/0895-4356(93)90173-X
Google Scholar

All articles

On Agreement Tables with Constant Kappa Values

Abstract

1. Introduction

2. Kappa Coefficients

3. A Class of Agreement Tables

4. Conclusion

Conflict of Interests

Acknowledgments

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

On Agreement Tables with Constant Kappa Values

Abstract

1. Introduction

2. Kappa Coefficients

3. A Class of Agreement Tables

4. Conclusion

Conflict of Interests

Acknowledgments

References

References

Related

Information