Automatic identification of bias in Large Language Models

Assi, Fernanda Malheiros

Automatic identification of bias in Large Language Models

Arquivos

item.page.bitstreams.primary dissertacao_mestrado_fernanda_malheiros_assi.pdf (3.18 MB)

Data

2026-05-21

Autores

Assi, Fernanda Malheiros

Editor

Universidade Federal de São Carlos

Resumo

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, from legal reasoning to clinical decision support. As these models become increasingly integrated into real-world applications, concerns about their reliability, fairness, and ethical implications have emerged. Studies have shown that LLMs can produce biased outputs, reinforcing harmful stereotypes and discriminating against marginalized groups. This work proposes a systematic and scalable framework for evaluating and ranking LLMs based on stereotype generation in Brazilian Portuguese. The framework combines template-based sentence generation, human annotation, and supervised classification into a unified pipeline. A set of 164 sentence templates, covering gender, race, and their intersections, was used to elicit completions from 37 LLMs from multiple providers. The resulting sentences were annotated by human annotators along two dimensions: alignment with social stereotypes and potential harm. The stereotype alignment labels served as the foundation for training a BERTimbau-based classifier, selected via nested cross-validation, which achieved a macro averaged F1 of 0.665. Classifier predictions were then used to construct pairwise match tables, feeding to an Elo rating system, that generated two complementary rankings: a model ranking and a social marker ranking. The results reveal that smaller open-source models tend to generate less stereotyped content than larger commercial ones, and that social markers combining race and gender consistently elicit the most stereotyped outputs across all models. The framework is made available as an interactive interface that supports the incremental addition of new models.

Palavras-chave

Bias in Language Models, Stereotype, Elo System, Brazilian Portuguese, Viés em Modelos de Linguagem, Estereótipo, Sistema Elo, Português do Brasil

Citação

ASSI, Fernanda Malheiros. Automatic identification of bias in Large Language Models. 2026. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, Campus São Carlos, 2026. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/24224.

URI

https://hdl.handle.net/20.500.14289/24224

Coleções

Teses e Dissertações

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil

Página do item completo

Automatic identification of bias in Large Language Models

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons