Anotação e caracterização de fenômenos léxico-ortográficos em tweets do mercado financeiro

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

With the growth of digital media and social networks, the importance of user-generated content (UGC) has increased considerably, creating demand for Natural Language Processing (NLP) tools and applications capable of handling the predominantly non-canonical language found across UGC genres. Annotated corpora are essential resources for this purpose, as are the description and analysis of their linguistic characteristics. Among these resources, corpora composed of tweets/posts stand out, given the relevance of the Twitter/X platform for various segments of society. In this undergraduate thesis, the manual annotation of lexical-orthographic phenomena in the financial-market tweet corpus known as DANTEStocks was further advanced. This annotation effort, which covered approximately 75% of the total tweets, enabled a preliminary characterization of the corpus based on a hierarchical typology of classes, types, and subtypes designed to capture creative phenomena and variation with respect to the standard norm.

Descrição

Citação

OLIVEIRA, Gabriela Pinheiro de. Anotação e caracterização de fenômenos léxico-ortográficos em tweets do mercado financeiro. 2025. Trabalho de Conclusão de Curso (Graduação em Linguística) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23309.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil