Auto-treinamento com ruído utilizando data augmentations para tarefas de detecção de comentários ofensivos e discurso de ódio

Leite, João Augusto

Auto-treinamento com ruído utilizando data augmentations para tarefas de detecção de comentários ofensivos e discurso de ódio

Arquivos

noisy_student_dissertacao_final.pdf (366.19 KB)

Data

2024-07-16

Autores

Leite, João Augusto

Editor

Universidade Federal de São Carlos

Resumo

Online social media is rife with offensive and hateful comments, necessitating the development of automated detection systems to manage the vast volume of posts generated every second. Creating high-quality human-labeled datasets for this task is challenging and costly, primarily because non-offensive posts significantly outnumber offensive ones. In contrast, unlabeled data is abundant, more accessible, and cheaper to obtain. This thesis explores the application of self-training methods, which leverage weakly-labeled examples to augment training datasets, in the context of offensive and hate speech detection. The core of this thesis is the paper "Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks", which investigates the efficacy of noisy self-training approaches incorporating data augmentation techniques to enhance prediction consistency and robustness against noisy data and adversarial attacks. Experiments are conducted with both default and noisy self-training using three different textual data augmentation techniques across five distinct pre-trained BERT architectures of varying sizes. The results indicated that noisy self-training with textual data augmentations, despite its success in similar settings, decreased performance in offensive and hate speech domains compared to the default method. This finding and reveals limitations of noisy self- training methods with data augmentations for domains such as offensive speech detection, where certain specific keywords cannot be modified without introducing semantic variations.

Palavras-chave

Self-supervision, Offensive and hateful speech detection, Data augmentation

Citação

LEITE, João Augusto. Auto-treinamento com ruído utilizando data augmentations para tarefas de detecção de comentários ofensivos e discurso de ódio. 2024. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20264.

URI

https://repositorio.ufscar.br/handle/20.500.14289/20264

Coleções

Teses e Dissertações

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-ShareAlike 3.0 Brazil

Página do item completo

Auto-treinamento com ruído utilizando data augmentations para tarefas de detecção de comentários ofensivos e discurso de ódio

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons