Using the normal-multinomial distribution to model multivariate count data

Marc Comas-Cufì (Universitat de Girona)

14 décembre 2017, 11h00–12h15


Salle MS001

MAD-Stat. Seminar


Multivariate count data contrained to add up to a certain constant are commonly modelled using the multinomial distribution. The Dirichlet distribution has been proposed for the multinomial probability parameter to account for data overdispersion which results in the compound Dirichlet-multinomial (DM) distribution. Although it satisfies nice mathematical properties, the DM distribution implies a fairly rigid covariance structure in practice. Alternatively, the normal-multinomial (NM) distribution is the compound probability distribution resulting from considering the multivariate logistic-normal as the distribution for the probability parameter vector of the multinomial distribution. This distribution is adequate to model multivariate count data when only the relative relationships between the multinomial components are of interest. It can also be used to deal with zero counts. In this talk we first introduce the NM distribution and revise its advantages and disadvantages. Then, we compare different approaches to estimate the parameters of a NM distribution. Finally, we discuss how the NM distribution can be incorporated in a generalized linear modelling (GLM) context to adjust covariates.