Explainable machine learning multi-label classification of Spanish legal judgements

Read original: arXiv:2405.17610 - Published 5/29/2024 by Francisco de Arriba-P'erez, Silvia Garc'ia-M'endez, Francisco J. Gonz'alez-Casta~no, Jaime Gonz'alez-Gonz'alez

Explainable machine learning multi-label classification of Spanish legal judgements

Overview

This paper presents an explainable machine learning approach for multi-label classification of Spanish legal judgments.
The researchers develop a model that can automatically classify legal cases into multiple relevant categories, while also providing explanations for the classifications.
The model is trained on a large dataset of Spanish court rulings and leverages natural language processing techniques to extract relevant features and make predictions.

Plain English Explanation

The researchers have developed a machine learning system that can analyze the text of Spanish legal rulings and automatically categorize them into multiple relevant topics or legal areas. This is a challenging task, as court cases can often cover a range of legal issues and concepts.

To tackle this, the researchers trained their model on a large dataset of past court rulings. The model uses natural language processing techniques to understand the key information and themes present in the text of each ruling. Based on this, it can then predict which legal categories or topics the ruling is most relevant to.

Importantly, the researchers also designed their model to be "explainable", meaning it can provide clear reasons and explanations for the categorizations it makes. This is crucial, as users need to be able to understand and trust the model's decisions, especially in a high-stakes domain like the legal system.

The researchers evaluated their model on a held-out test set of court rulings and found that it was able to accurately predict the multiple relevant legal categories for each case. This suggests their approach could be a valuable tool for legal practitioners, researchers, and others working with large volumes of legal texts.

Technical Explanation

The core of the researchers' approach is a multi-label classification model that takes the text of a Spanish legal judgment as input and outputs a set of relevant legal categories or topics.

To build this model, the researchers first preprocessed their dataset of court rulings, cleaning the text and extracting relevant features using natural language processing techniques. They experimented with different feature representations, including bag-of-words, TF-IDF, and contextualized embeddings from language models like BERT.

The researchers then trained a series of machine learning models on this processed data, including logistic regression, support vector machines, and deep neural networks. The goal was to find the model architecture and hyperparameters that could best predict the multiple legal categories associated with each case.

Importantly, the researchers also incorporated explainability mechanisms into their final model. This allows the system to not only make predictions, but also provide interpretable explanations for its classifications. Specifically, they used techniques like SHAP values to highlight the most influential textual features driving the model's decisions.

In their experiments, the researchers found that the deep neural network model with BERT-based features and SHAP-based explanations achieved the best performance on the multi-label legal classification task. This suggests their approach is a promising direction for building transparent and accountable AI systems in the legal domain.

Critical Analysis

A key strength of this research is the focus on explainability and interpretability of the model's predictions. By incorporating SHAP values, the researchers enable their system to provide clear justifications for its classifications, which is crucial for building trust and adoption in high-stakes applications like the legal system.

However, the dataset used in this study is limited to Spanish legal judgments, so further research would be needed to assess the generalizability of the approach to other legal jurisdictions or document types. Additionally, the model's performance could potentially be improved by incorporating domain-specific knowledge or leveraging larger language models fine-tuned on legal text.

Another limitation is that the paper does not deeply explore the types of legal concepts or themes that the model is able to accurately identify and explain. Providing more nuanced analysis and examples of the model's capabilities in this regard could strengthen the contribution.

Overall, this work represents a promising step towards building more transparent and accountable AI systems for legal text analysis and classification. Further research in this direction, as seen in related work like topic modeling of case law, outcome prediction with explainability, and Bayesian language modeling for uncertainty, could yield valuable insights and tools for the legal domain.

Conclusion

This paper presents an explainable machine learning approach for multi-label classification of Spanish legal judgments. By incorporating interpretability mechanisms like SHAP values, the researchers have developed a model that can not only accurately predict the relevant legal categories for a given court ruling, but also provide clear explanations for its decisions.

The ability to automatically classify legal texts while also explaining the reasoning behind the classifications is a valuable capability, with potential applications in areas like legal research, case management, and decision support. This work demonstrates the promise of applying explainable AI techniques to high-stakes domains like the legal system, and suggests fruitful directions for future research in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Explainable machine learning multi-label classification of Spanish legal judgements

Francisco de Arriba-P'erez, Silvia Garc'ia-M'endez, Francisco J. Gonz'alez-Casta~no, Jaime Gonz'alez-Gonz'alez

Artificial Intelligence techniques such as Machine Learning (ML) have not been exploited to their maximum potential in the legal domain. This has been partially due to the insufficient explanations they provided about their decisions. Automatic expert systems with explanatory capabilities can be specially useful when legal practitioners search jurisprudence to gather contextual knowledge for their cases. Therefore, we propose a hybrid system that applies ML for multi-label classification of judgements (sentences) and visual and natural language descriptions for explanation purposes, boosted by Natural Language Processing techniques and deep legal reasoning to identify the entities, such as the parties, involved. We are not aware of any prior work on automatic multi-label classification of legal judgements also providing natural language explanations to the end-users with comparable overall quality. Our solution achieves over 85 % micro precision on a labelled data set annotated by legal experts. This endorses its interest to relieve human experts from monotonous labour-intensive legal classification tasks.

5/29/2024

🧠

A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Nishchal Prasad, Mohand Boughanem, Taoufik Dkaki

Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as scarce annotated legal documents and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; Multi-stage Encoder-based Supervised with-clustering; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.

7/1/2024

A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets

Mariana Yukari Noguti, Edduardo Vellasques, Luiz Eduardo Soares Oliveira

Recent advances in language modelling has significantly decreased the need of labelled data in text classification tasks. Transformer-based models, pre-trained on unlabeled data, can outmatch the performance of models trained from scratch for each task. However, the amount of labelled data need to fine-tune such type of model is still considerably high for domains requiring expert-level annotators, like the legal domain. This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data and perform a classification task in the legal area with 50 predefined topics. More specifically, we use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects, which currently demands deep legal knowledge for manual filling. The task of optimizing the performance of classifiers in this scenario is especially challenging, given the low amount of resources available regarding the Portuguese language, especially in the legal domain. Our results demonstrate that classic supervised models such as logistic regression and SVM and the ensembles random forest and gradient boosting achieve better performance along with embeddings extracted with word2vec when compared to BERT language model. The latter demonstrates superior performance in association with the architecture of the model itself as a classifier, having surpassed all previous models in that regard. The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning, with an accuracy of 80.7% in the aforementioned task.

9/11/2024

💬

Large Language Models for Judicial Entity Extraction: A Comparative Study

Atin Sakkeer Hussain, Anu Thomas

Domain-specific Entity Recognition holds significant importance in legal contexts, serving as a fundamental task that supports various applications such as question-answering systems, text summarization, machine translation, sentiment analysis, and information retrieval specifically within case law documents. Recent advancements have highlighted the efficacy of Large Language Models in natural language processing tasks, demonstrating their capability to accurately detect and classify domain-specific facts (entities) from specialized texts like clinical and financial documents. This research investigates the application of Large Language Models in identifying domain-specific entities (e.g., courts, petitioner, judge, lawyer, respondents, FIR nos.) within case law documents, with a specific focus on their aptitude for handling domain-specific language complexity and contextual variations. The study evaluates the performance of state-of-the-art Large Language Model architectures, including Large Language Model Meta AI 3, Mistral, and Gemma, in the context of extracting judicial facts tailored to Indian judicial texts. Mistral and Gemma emerged as the top-performing models, showcasing balanced precision and recall crucial for accurate entity identification. These findings confirm the value of Large Language Models in judicial documents and demonstrate how they can facilitate and quicken scientific research by producing precise, organised data outputs that are appropriate for in-depth examination.

7/9/2024