Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
The growing volume of cases in Brazil’s higher courts has fostered the use of machine learning to support the triage and organization of legal documents. This work investigates automatic classification of Brazilian Supreme Court (STF) documents into six canonical types (Extraordinary Appeal, Interlocutory Appeal in Extraordinary Appeal, Judgment, Appellate Decision, Admissibility Order, and Others) using the Victor dataset, as well as the generation of natural-language explanations for model decisions. We compare classical baselines based on TF–IDF and linear classifiers (SVM and Naive Bayes) with two Gemma 3 language models fine-tuned for sequence classification (270M parameters with full fine-tuning and a 4B-parameter model trained via QLoRA). Results show that Naive Bayes, despite high accuracy, collapses on minority classes and yields very low macro-F1, whereas a class-weighted SVM achieves robust performance (test macro-F1 around 0.82) and is already competitive for this task; Gemma 3 270M and 4B QLoRA deliver only modest macro-F1 gains, indicating that simple lexical cues capture most of the structure of document type classification. We then propose hybrid architectures in which a decision engine (SVM or Gemma 3 270M) is combined with a Gemma 3 4B chat LLM that generates legal Portuguese explanations conditioned on relevant n-grams extracted by the linear classifier. Explanation quality is assessed through a simple lexical fidelity metric (feature coverage), with average coverage around 0.79 in the SVM + LLM scenario and very similar values (approximately 0.76 vs. 0.77) when comparing SVM and Gemma 3 270M as decision engines. Overall, the results suggest that TF–IDF + SVM provides a lightweight and effective baseline for document type classification on Victor, while hybrid architectures with an LLM explainer offer a pragmatic trade-off between performance, computational cost, and minimal transparency of automatic decisions.
Descrição
Citação
BERTHO, Gabriel Andreazi. Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal. 2025. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23307.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil
