Proximity and gravity: modeling heaped self-reports

Chelsea Mc Carty Allen, Sandra D. Griffith, Saul Shiffman, Daniel F. Heitjan

Research output: Contribution to journalArticle

Abstract

Self-reported daily cigarette counts typically exhibit a preponderance of round numbers, a phenomenon known as heaping or digit preference. Heaping can be a substantial nuisance, as scientific interest lies in the distribution of the underlying true values rather than that of the heaped data. In principle, we can estimate parameters of the underlying distribution from heaped data if we know the conditional distribution of the heaped count given the true count, denoted the heaping mechanism (analogous to the missingness mechanism for missing data). In general, it is not possible to estimate the heaping mechanism robustly from heaped data only. A doubly-coded smoking cessation trial data set that includes daily cigarette count as both a conventional heaped retrospective recall measurement and a precise instantaneous measurement offers the rare opportunity to directly estimate the heaping mechanism. We propose a novel model that describes the conditional probability of the self-reported count as a function of its proximity to the truth and its intrinsic attractiveness, denoted its gravity. We apply variations of the model to the cigarette count data, illuminating the cognitive processes that influence self-reporting behaviors. The principal application of the model will be to enabling the correct analysis of heaped-only data sets.

Original languageEnglish (US)
Pages (from-to)3200-3215
Number of pages16
JournalStatistics in Medicine
Volume36
Issue number20
DOIs
StatePublished - Sep 10 2017

Fingerprint

Gravitation
Tobacco Products
Self Report
Proximity
Gravity
Count
Modeling
Smoking Cessation
Estimate
Count Data
Smoking
Conditional probability
Conditional Distribution
Missing Data
Digit
Instantaneous
Model
Datasets

Keywords

  • conditional distribution
  • ecological momentary assessment
  • rounded data
  • smoking cessation
  • timeline followback

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Proximity and gravity : modeling heaped self-reports. / Allen, Chelsea Mc Carty; Griffith, Sandra D.; Shiffman, Saul; Heitjan, Daniel F.

In: Statistics in Medicine, Vol. 36, No. 20, 10.09.2017, p. 3200-3215.

Research output: Contribution to journalArticle

Allen, CMC, Griffith, SD, Shiffman, S & Heitjan, DF 2017, 'Proximity and gravity: modeling heaped self-reports', Statistics in Medicine, vol. 36, no. 20, pp. 3200-3215. https://doi.org/10.1002/sim.7327
Allen, Chelsea Mc Carty ; Griffith, Sandra D. ; Shiffman, Saul ; Heitjan, Daniel F. / Proximity and gravity : modeling heaped self-reports. In: Statistics in Medicine. 2017 ; Vol. 36, No. 20. pp. 3200-3215.
@article{e6b782e042804cc393eb27fa519f84b5,
title = "Proximity and gravity: modeling heaped self-reports",
abstract = "Self-reported daily cigarette counts typically exhibit a preponderance of round numbers, a phenomenon known as heaping or digit preference. Heaping can be a substantial nuisance, as scientific interest lies in the distribution of the underlying true values rather than that of the heaped data. In principle, we can estimate parameters of the underlying distribution from heaped data if we know the conditional distribution of the heaped count given the true count, denoted the heaping mechanism (analogous to the missingness mechanism for missing data). In general, it is not possible to estimate the heaping mechanism robustly from heaped data only. A doubly-coded smoking cessation trial data set that includes daily cigarette count as both a conventional heaped retrospective recall measurement and a precise instantaneous measurement offers the rare opportunity to directly estimate the heaping mechanism. We propose a novel model that describes the conditional probability of the self-reported count as a function of its proximity to the truth and its intrinsic attractiveness, denoted its gravity. We apply variations of the model to the cigarette count data, illuminating the cognitive processes that influence self-reporting behaviors. The principal application of the model will be to enabling the correct analysis of heaped-only data sets.",
keywords = "conditional distribution, ecological momentary assessment, rounded data, smoking cessation, timeline followback",
author = "Allen, {Chelsea Mc Carty} and Griffith, {Sandra D.} and Saul Shiffman and Heitjan, {Daniel F.}",
year = "2017",
month = "9",
day = "10",
doi = "10.1002/sim.7327",
language = "English (US)",
volume = "36",
pages = "3200--3215",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "20",

}

TY - JOUR

T1 - Proximity and gravity

T2 - modeling heaped self-reports

AU - Allen, Chelsea Mc Carty

AU - Griffith, Sandra D.

AU - Shiffman, Saul

AU - Heitjan, Daniel F.

PY - 2017/9/10

Y1 - 2017/9/10

N2 - Self-reported daily cigarette counts typically exhibit a preponderance of round numbers, a phenomenon known as heaping or digit preference. Heaping can be a substantial nuisance, as scientific interest lies in the distribution of the underlying true values rather than that of the heaped data. In principle, we can estimate parameters of the underlying distribution from heaped data if we know the conditional distribution of the heaped count given the true count, denoted the heaping mechanism (analogous to the missingness mechanism for missing data). In general, it is not possible to estimate the heaping mechanism robustly from heaped data only. A doubly-coded smoking cessation trial data set that includes daily cigarette count as both a conventional heaped retrospective recall measurement and a precise instantaneous measurement offers the rare opportunity to directly estimate the heaping mechanism. We propose a novel model that describes the conditional probability of the self-reported count as a function of its proximity to the truth and its intrinsic attractiveness, denoted its gravity. We apply variations of the model to the cigarette count data, illuminating the cognitive processes that influence self-reporting behaviors. The principal application of the model will be to enabling the correct analysis of heaped-only data sets.

AB - Self-reported daily cigarette counts typically exhibit a preponderance of round numbers, a phenomenon known as heaping or digit preference. Heaping can be a substantial nuisance, as scientific interest lies in the distribution of the underlying true values rather than that of the heaped data. In principle, we can estimate parameters of the underlying distribution from heaped data if we know the conditional distribution of the heaped count given the true count, denoted the heaping mechanism (analogous to the missingness mechanism for missing data). In general, it is not possible to estimate the heaping mechanism robustly from heaped data only. A doubly-coded smoking cessation trial data set that includes daily cigarette count as both a conventional heaped retrospective recall measurement and a precise instantaneous measurement offers the rare opportunity to directly estimate the heaping mechanism. We propose a novel model that describes the conditional probability of the self-reported count as a function of its proximity to the truth and its intrinsic attractiveness, denoted its gravity. We apply variations of the model to the cigarette count data, illuminating the cognitive processes that influence self-reporting behaviors. The principal application of the model will be to enabling the correct analysis of heaped-only data sets.

KW - conditional distribution

KW - ecological momentary assessment

KW - rounded data

KW - smoking cessation

KW - timeline followback

UR - http://www.scopus.com/inward/record.url?scp=85026653475&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026653475&partnerID=8YFLogxK

U2 - 10.1002/sim.7327

DO - 10.1002/sim.7327

M3 - Article

C2 - 28497551

AN - SCOPUS:85026653475

VL - 36

SP - 3200

EP - 3215

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 20

ER -