Thi Huong An NGUYEN will defend her thesis on Mathematics on «Contribution to the Statistical Analysis of Compositional Data with an Application to Political Economy», next October 14, at 09:00 am Room MF 323 (Manufacture des Tabacs).
- Supervisor: Anne Ruiz Gazen, Researcher, Toulouse School of Economics - UT1.
- Co-supervisor: Christine Thomas-Agnan, Researcher, Toulouse School of Economics - UT1.
- Raja CHAKIR, INRA
- Peter FILZMOSER, Vienna University of Technology
- Josep-Antoni Martin FERNANDEZ, Université de Girona
- Denis ALLARD, INRA
- Michel Le BRETON, TSE
The mathematical developments of this thesis are motivated by a political economic question relative to the relationship between vote shares and socio- economic characteristics in a multiparty election system. As an illustrative example we will explore the outcome of the 2015 French departmental election. The outcome of this election will be aggregated, as done by the French Ministry of Interior, into three big parties which are Left wing, Right wing and Extreme Right. Mathematically, the outcome is therefore for each department or for each canton a vector of percentages or proportions of votes per party. The sum of its components is thereby constrained to be constant, equal to 1 for proportions, 100 for percentages. This type of data are called compositional data. Besides, they are observed at some spatial scale such as the departmental level or the canton level. The analysis of such data with classical statistical methods is inappropriate due to the constraints on the sum of the components. Adapted methods have to be developed and this is the objective of compositional data analysis (CODA). However two aspects of our problem make this compositional data analysis more complex: firstly, a fact documented in the literature is that electoral data may exhibit heavy tail behaviors and secondly, because this data is relative to geographical areas, spatial heterogeneity and spatial autocorrela- tion must be taken into account. At the beginning of my research project, I recall the principles of compositional data analysis. I then build a regression model that could be considered to explain the outcome of an election and to clarify its relations with the socio-economic factors. In order to deal with the heavy tail behavior, a proposal found in the literature is to replace the Gaussian distribution by the Student distribution. However, since there is not a unique way of using the multivariate Student distribution in a multivariate regression model, we first need to study the properties of two competing models: the un- correlated Student (UT) and the Independent Student (IT) models. Finally, to take into account possible spatial autocorrelation, we need to consider spa- tial autoregressive models for multivariate outcomes. We show how to combine the spatial and the compositional perspectives by using a simultaneous system of spatially interrelated cross-sectional equations in the coordinate space and propose a formulation of this model in the simplex space.