Mapping the Geometry of Law using Document Embeddings

Résumé

Judicial documents and judgments are a rich source of information about legal cases, litigants, and judicial decision-makers. Natural language processing (NLP) based approaches have recently received much attention for their ability to decipher implicit information from text. NLP researchers have successfully developed data-driven representations of text using dense vectors that encode the relations between those objects. In this study, we explore the application of the Doc2Vec model to legal language to understand judicial reasoning and identify implicit patterns in judgments and judges. In an application to federal appellate courts, we show that these vectors encode information that distinguishes courts in time and legal topics. We use Doc2Vec document embeddings to study the patterns and train a classifier model to predict cases with a high chance of being appealed at the Supreme Court of the United States (SCOTUS). There are no existing benchmarks, and we present the first results at this task at scale. Furthermore, we analyze generic writing/judgment patterns of prominent judges using deep learning-based autoencoder models. Overall, we observe that Doc2Vec document embeddings capture important legal information and are helpful in downstream tasks.