Measuring agreement in medical informatics reliability studies

George Hripcsak; Daniel F. Heitjan

doi:10.1016/S1532-0464(02)00500-2

Measuring agreement in medical informatics reliability studies

George Hripcsak, Daniel F. Heitjan

Research output: Contribution to journal › Article › peer-review

184 Scopus citations

Abstract

Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement metrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it.

Original language	English (US)
Pages (from-to)	99-110
Number of pages	12
Journal	Journal of Biomedical Informatics
Volume	35
Issue number	2
DOIs	https://doi.org/10.1016/S1532-0464(02)00500-2
State	Published - 2002

Keywords

Agreement
Kappa
Latent structure analysis
Prevalence
Reliability
Tetrachoric correlation

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1016/S1532-0464(02)00500-2

Cite this

@article{aca4501e1892402d97efa3299cca5654,

title = "Measuring agreement in medical informatics reliability studies",

abstract = "Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement metrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it.",

keywords = "Agreement, Kappa, Latent structure analysis, Prevalence, Reliability, Tetrachoric correlation",

author = "George Hripcsak and Heitjan, {Daniel F.}",

note = "Funding Information: Supported by National Library of Medicine grants R01 LM06910 “Discovering and applying knowledge in clinical databases” and R01 LM06274 “Unlocking data from medical records with text processing,” and Pfizer grant NY01-002153889 “Using information systems to advance clinical research and clinical care.” ",

year = "2002",

doi = "10.1016/S1532-0464(02)00500-2",

language = "English (US)",

volume = "35",

pages = "99--110",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "2",

}

TY - JOUR

T1 - Measuring agreement in medical informatics reliability studies

AU - Hripcsak, George

AU - Heitjan, Daniel F.

N1 - Funding Information: Supported by National Library of Medicine grants R01 LM06910 “Discovering and applying knowledge in clinical databases” and R01 LM06274 “Unlocking data from medical records with text processing,” and Pfizer grant NY01-002153889 “Using information systems to advance clinical research and clinical care.”

PY - 2002

Y1 - 2002

N2 - Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement metrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it.

AB - Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement metrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it.

KW - Agreement

KW - Kappa

KW - Latent structure analysis

KW - Prevalence

KW - Reliability

KW - Tetrachoric correlation

UR - http://www.scopus.com/inward/record.url?scp=0036433101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036433101&partnerID=8YFLogxK

U2 - 10.1016/S1532-0464(02)00500-2

DO - 10.1016/S1532-0464(02)00500-2

M3 - Article

C2 - 12474424

AN - SCOPUS:0036433101

SN - 1532-0464

VL - 35

SP - 99

EP - 110

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 2

ER -

Measuring agreement in medical informatics reliability studies

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this