Séminaire

Approximation bounds for conditional expectations and nonparametric regressions: theory and inference

Jean-Marie Dufour (McGill University)

14 juin 2022, 12h30–13h50

Salle Auditorium 6

Econometrics and Empirical Economics Seminar

Résumé

This paper proposes a bound approach to nonparametric regression. The object of interest, the conditional expectation $E\left[Y|X\right]$, is in general unknown and difficult to identify. In the spirit of the parsimony principle, we employ a simple parametric model to approximate the true model and then bound the approximation error using concentration inequalities to build confidence sets for $E\left[Y|X\right]$. Our approach is valid under less stringent regularity assumptions than conventional nonparametric methods, such as kernel regression and the sieve method. In particular, our framework allows for incomplete identification of the regression function and inference takes the form of sets in a partial identification framework. We show that approximation bounds can be built using only moments of observables and discuss how shape restrictions (e.g. smoothness) can be exploited to improve such bounds. We study optimality and uniformity of the proposed bounds by using the concepts of sharpness and honesty criteria. Inference only requires estimation of a simple parametric model and moments of observables along with results from the theory of M-estimation. Thus, it is easy to implement and enjoys favourable finite-sample properties. Our Monte Carlo simulation studies compare our method with alternative methods (Nadaraya-Watson, local linear, the sieve method, random forest, LASSO, and neural network) in terms of the average widths and coverage probabilities of associated confidence sets and the mean squared error of point estimates. Results show that the proposed method delivers valid confidence sets in cases where the other methods fail or cannot provide confidence sets at all. As an empirical application, we apply our method to inference for auto miles-per-gallon based on car attributes, the dataset of which is available from the UCI machine learning repository. Our method yields confidence sets with the shortest width while maintaining the size and generates best out-of-sample predictions based on point estimates. These findings support our theoretical results on finite-sample properties. In another application, we demonstrate how our bound approach provides economically significant information regarding the shape of regression curves, using household consumption data.