Reference

Antoine Rebecq (INSEE), Sampling graphs efficiently: model assisted designs and application to Twitter data, MAD-Stat. Seminar, Toulouse: TSE, March 23, 2017, 11:00–12:15, room MF 323.

Abstract

The rise of extremely popular social networks (such as Facebook, Twit- ter, etc.) has generated huge interest in the industry and in the academy for datasets whose supports are graphs. When the networks are par- ticularly big, even computing simple descriptive statistics (for example clustering or centrality) can become very costly. Most of the statistical literature on networks focuses on modeling and model-based inference. Modeling can give powerful insights on some aspects, but cannot be used for quick computation of estimates of statistics of interest. Networks have also been a popular topic in computer science in the last few years. Many of the algorithms implemented in the reference libraries for statistical net- work analysis rather use simple sampling methods. In this presentation we will try to build efficient design-based estimates of statistics of interest on graphs using sampling. Efficiency is obtained by adjusting the de- sign depending on both the graph model and the quantity measured. An application on Twitter data will be presented.