Prediction Models for Integer and Count Data

September 25th, 12:00 pm- 1:00 pm in DCH 3092
Speaker: Daniel Kowal (STAT)

Please indicate interest, especially if you want lunch, here.
Abstract:

A challenging scenario for prediction and inference occurs when the outcome variables are integer-valued, such as counts, (test) scores, or rounded data. Integer-valued data are discrete data and exhibit a variety of complex distributional features including zero-inflation, skewness, over- or under-dispersion, and in some cases may be bounded or censored. To meet these challenges, we propose a simple yet powerful framework for modeling integer-valued data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. Despite their simplicity, STAR processes possess key distributional properties and are capable of modeling the complex features inherent to integer-valued data. By design, STAR directly builds upon and incorporates the models and algorithms for continuous-valued data, such as Gaussian linear models, additive models, and Bayesian Additive Regression Trees. Estimation and inference are available for both Bayesian and frequentist models. Empirical comparisons are presented for several datasets, including a large healthcare utilization dataset, animal abundance data, and synthetic data. STAR demonstrates impressive predictive distribution accuracy with greater flexibility and scalability than existing integer-valued models.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *