Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal

Bertho, Gabriel Andreazi

Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal

Arquivos

2025_2_TCC_2_Gabriel_Bertho.pdf (29.66 MB)

Data

2025-12-17

Autores

Bertho, Gabriel Andreazi

Editor

Universidade Federal de São Carlos

Resumo

The growing volume of cases in Brazil’s higher courts has fostered the use of machine learning to support the triage and organization of legal documents. This work investigates automatic classification of Brazilian Supreme Court (STF) documents into six canonical types (Extraordinary Appeal, Interlocutory Appeal in Extraordinary Appeal, Judgment, Appellate Decision, Admissibility Order, and Others) using the Victor dataset, as well as the generation of natural-language explanations for model decisions. We compare classical baselines based on TF–IDF and linear classifiers (SVM and Naive Bayes) with two Gemma 3 language models fine-tuned for sequence classification (270M parameters with full fine-tuning and a 4B-parameter model trained via QLoRA). Results show that Naive Bayes, despite high accuracy, collapses on minority classes and yields very low macro-F1, whereas a class-weighted SVM achieves robust performance (test macro-F1 around 0.82) and is already competitive for this task; Gemma 3 270M and 4B QLoRA deliver only modest macro-F1 gains, indicating that simple lexical cues capture most of the structure of document type classification. We then propose hybrid architectures in which a decision engine (SVM or Gemma 3 270M) is combined with a Gemma 3 4B chat LLM that generates legal Portuguese explanations conditioned on relevant n-grams extracted by the linear classifier. Explanation quality is assessed through a simple lexical fidelity metric (feature coverage), with average coverage around 0.79 in the SVM + LLM scenario and very similar values (approximately 0.76 vs. 0.77) when comparing SVM and Gemma 3 270M as decision engines. Overall, the results suggest that TF–IDF + SVM provides a lightweight and effective baseline for document type classification on Victor, while hybrid architectures with an LLM explainer offer a pragmatic trade-off between performance, computational cost, and minimal transparency of automatic decisions.

Palavras-chave

Classificação de documentos jurídicos, Modelos de Linguagem de Grande Escala (LLMs), Inteligência Artificial Explicável, Dataset Victor

Citação

BERTHO, Gabriel Andreazi. Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal. 2025. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23307.

URI

https://hdl.handle.net/20.500.14289/23307

Coleções

TCC

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil

Página do item completo

Arquiteturas híbridas explicáveis com modelos lineares e LLMs para classificação de peças processuais: estudo de caso no Supremo Tribunal Federal

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons