Margins of error and Big Data

January 30, 2019 Digital

Anne Ruiz-Gazen works on statistical methodologies, in particular on understanding and improving margins of error. “My mathematical work can be applied to numerous fields, such as socio-economic surveys, or in industry, for detecting anomalies.” she explains.

Margins of error

For example, to assist major national bodies such as INSEE (the French national institute for statistics and economic studies), INED (national institute for demographic studies) or INSERM (national institute of health and medical research) when they carry out national surveys. Anne Ruiz-Gazen works on improving and understanding the reliability of these types of survey. “We recently collaborated with INED on a major survey connected with tracking individuals from birth to the age of twenty. We calculated the reliability of their results on the basis of their sampling methods.” Her research showed that the chosen protocol was not optimal and had increased the survey's margin of error. “The survey uses the same child birth dates for all the deliveries sampled, which increases uncertainty, because this choice potentially reduces variability within the sample.”

She also works on electoral polling, and particularly on spatiality of the data. “For all the French Departmental elections, we work with Christine Thomas (TSE - UTC), Thibault Laurent (TSE - UTC) and An Huong Nguyen (doctoral student – TSE – UTC) on prediction models that take the specific nature of this type of data, known as composition data, into consideration; we also look at geographic location, so as to anticipate the effects of an economic or demographic change on the results.” 

"Clean" data

With the aim of improving data, Anne Ruiz-Gazen works with Dr Aurore Archimbaud of TSE on ways to detect anomalies using applications in industry. “With the exponential increase in the number of measurements taken using electronic components, there are problems of scale when searching for anomalies. The results of this work have since been used by several companies to reduce manufacturing flaws.”

Most economists and statisticians agree in saying that the arrival of Big Data - the exponential growth in quantities of data available for processing - represents a major development for society in years to come. “There is no doubt that Big Data is the future, but one of the often-forgotten challenges of this revolution is the reliability of the data published. Improving the accuracy of estimates based on survey data using huge volumes of data is a difficult topic with plenty to consider.” 

UseR! 2019

This annual conference dedicated to the R free software was first held in 2004, and since 2006 has alternated between European and US cities. After Rennes in 2009, Toulouse will be the second French city to host the event, which brings together over 1000 researchers and economic decisions makers to consider the latest developments in the software. “We are proud to organise UseR! at TSE, in partnership with Paul Sabsatier University and the INRA (French National Institute for Agricultural Research); it is excellent news for all the companies and scientists who use this tool, a touchstone in the field.”