Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models

Résumé

We provide some simple theoretical results that justify incorporating machinelearning in a standard linear instrumental variable setting, prevalent in empiri-cal research in economics. Machine learning techniques, combined with sample-splitting, extract nonlinear variation in the instrument that may dramatically im-prove estimation precision and robustness by boosting instrument strength. Theanalysis is straightforward in the absence of covariates. The presence of linearlyincluded exogenous covariates complicates identification, as the researcher wouldlike to prevent nonlinearities in the covariates from providing the identifying vari-ation. Our procedure can be effectively adapted to account for this complication,based on an argument byChamberlain(1992). Our method preserves standard in-tuitions and interpretations of linear instrumental variable methods and provides asimple, user-friendly upgrade to the applied economics toolbox. We illustrate ourmethod with an example in law and criminal justice, examining the causal effectof appellate court reversals on district court sentencing decisions.