Séminaire

Sampling graphs efficiently: model assisted designs and application to Twitter data

Antoine Rebecq (INSEE)

23 mars 2017, 11h00–12h15

Toulouse

Salle MF 323

MAD-Stat. Seminar

Résumé

The rise of extremely popular social networks (such as Facebook, Twit- ter, etc.) has generated huge interest in the industry and in the academy for datasets whose supports are graphs. When the networks are par- ticularly big, even computing simple descriptive statistics (for example clustering or centrality) can become very costly. Most of the statistical literature on networks focuses on modeling and model-based inference. Modeling can give powerful insights on some aspects, but cannot be used for quick computation of estimates of statistics of interest. Networks have also been a popular topic in computer science in the last few years. Many of the algorithms implemented in the reference libraries for statistical net- work analysis rather use simple sampling methods. In this presentation we will try to build efficient design-based estimates of statistics of interest on graphs using sampling. Efficiency is obtained by adjusting the de- sign depending on both the graph model and the quantity measured. An application on Twitter data will be presented.