To build inputs for end-to-end machine learning estimates of the causal impacts of law, we consider the problem of automatically classifying cases by their policy impact. We propose and implement a semi-supervised multi-class learning model, with the training set being a hand-coded dataset of thousands of cases in over 20 politically salient policy topics. Using opinion text features as a set of predictors, our model can classify labeled cases by topic correctly 91% of the time. We then take the model to the broader set of unlabeled cases and show that it can identify new groups of cases by shared policy impact.
TSE Working Paper, n° 18-977, août 2018