Seminar

Sampling graphs efficiently: model assisted designs and application to Twitter data

Antoine Rebecq (INSEE)

March 23, 2017, 11:00–12:15

Toulouse

Room MF 323

MAD-Stat. Seminar

Abstract

The rise of extremely popular social networks (such as Facebook, Twit- ter, etc.) has generated huge interest in the industry and in the academy for datasets whose supports are graphs. When the networks are par- ticularly big, even computing simple descriptive statistics (for example clustering or centrality) can become very costly. Most of the statistical literature on networks focuses on modeling and model-based inference. Modeling can give powerful insights on some aspects, but cannot be used for quick computation of estimates of statistics of interest. Networks have also been a popular topic in computer science in the last few years. Many of the algorithms implemented in the reference libraries for statistical net- work analysis rather use simple sampling methods. In this presentation we will try to build efficient design-based estimates of statistics of interest on graphs using sampling. Efficiency is obtained by adjusting the de- sign depending on both the graph model and the quantity measured. An application on Twitter data will be presented.