TY - JOUR
T1 - Assessing race and ethnicity data quality across cancer registries and EMRs in two hospitals
AU - Lee, Simon J Craddock
AU - Grobe, James E.
AU - Tiro, Jasmin A.
N1 - Funding Information:
This work was supported by a Special Interest Award for Cancer Disparities Research to Dr. Lee (ACS IRG 02-196), the University of Texas Southwestern Center for Translational Medicine (NIH UL1TR001105), and the University of Texas Southwestern Center for Patient-Centered Outcomes Research (AHRQ R24 HS022418).
Publisher Copyright:
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2016/5
Y1 - 2016/5
N2 - Background Measurement of patient race/ethnicity in electronic health records is mandated and important for tracking health disparities. Objective Characterize the quality of race/ethnicity data collection efforts. Methods For all cancer patients diagnosed (2007-2010) at two hospitals, we extracted demographic data from five sources: 1) a university hospi- tal cancer registry, 2) a university electronic medical record (EMR), 3) a community hospital cancer registry, 4) a community EMR, and 5) a joint clinical research registry. The patients whose data we examined (N = 17 834) contributed 41 025 entries (range: 2-5 per patient across sources), and the source comparisons generated 1-10 unique pairs per patient. We used generalized estimating equations, chi-squares tests, and kappas estimates to assess data availability and agreement. Results Compared to sex and insurance status, race/ethnicity information was significantly less likely to be available (x2 >8043, P<.001), with variation across sources (x2 >10 589, P<.001). The university EMR had a high prevalence of "Unknown" values. Aggregate kappa estimates across the sources was 0.45 (95% confidence interval, 0.45-0.45; N = 31 276 unique pairs), but improved in sensitivity analyses that excluded the university EMR source (k = 0.89). Race/ethnicity data were in complete agreement for only 6988 patients (39.2%). Pairs with a "Black" data value in one of the sources had the highest agreement (95.3%), whereas pairs with an "Other" value exhibited the lowest agreement across sour- ces (11.1%). Discussion Our findings suggest that high-quality race/ethnicity data are attainable. Many of the "errors" in race/ethnicity data are caused by missing or "Unknown" data values. Conclusions To facilitate transparent reporting of healthcare delivery outcomes by race/ethnicity, healthcare systems need to monitor and enforce race/ethnicity data collection standards.
AB - Background Measurement of patient race/ethnicity in electronic health records is mandated and important for tracking health disparities. Objective Characterize the quality of race/ethnicity data collection efforts. Methods For all cancer patients diagnosed (2007-2010) at two hospitals, we extracted demographic data from five sources: 1) a university hospi- tal cancer registry, 2) a university electronic medical record (EMR), 3) a community hospital cancer registry, 4) a community EMR, and 5) a joint clinical research registry. The patients whose data we examined (N = 17 834) contributed 41 025 entries (range: 2-5 per patient across sources), and the source comparisons generated 1-10 unique pairs per patient. We used generalized estimating equations, chi-squares tests, and kappas estimates to assess data availability and agreement. Results Compared to sex and insurance status, race/ethnicity information was significantly less likely to be available (x2 >8043, P<.001), with variation across sources (x2 >10 589, P<.001). The university EMR had a high prevalence of "Unknown" values. Aggregate kappa estimates across the sources was 0.45 (95% confidence interval, 0.45-0.45; N = 31 276 unique pairs), but improved in sensitivity analyses that excluded the university EMR source (k = 0.89). Race/ethnicity data were in complete agreement for only 6988 patients (39.2%). Pairs with a "Black" data value in one of the sources had the highest agreement (95.3%), whereas pairs with an "Other" value exhibited the lowest agreement across sour- ces (11.1%). Discussion Our findings suggest that high-quality race/ethnicity data are attainable. Many of the "errors" in race/ethnicity data are caused by missing or "Unknown" data values. Conclusions To facilitate transparent reporting of healthcare delivery outcomes by race/ethnicity, healthcare systems need to monitor and enforce race/ethnicity data collection standards.
KW - Cancer registry
KW - Data quality
KW - Electronic medical record
KW - Race and ethnicity
UR - http://www.scopus.com/inward/record.url?scp=84979075943&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979075943&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocv156
DO - 10.1093/jamia/ocv156
M3 - Article
C2 - 26661718
AN - SCOPUS:84979075943
SN - 1067-5027
VL - 23
SP - 627
EP - 634
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -