Análise de regras linguísticas para o aperfeiçoamento de anotações automáticas de part-of-speech

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

The automatic morphosyntactic tagging of Part-of-Speech, also known as PoS tagging, is an essential task as it is one of the initial text processing steps that a text undergoes during analysis performed by Natural Language Processing (NLP) applications or methods. The task involves classifying words in a text according to their grammatical classes. In the literature, there are numerous research efforts dedicated to this type of activity, mostly focused on corpora of more formal genres such as journalistic and academic texts. Furthermore, Universal Dependencies (UD) is the most widely adopted linguistic theory in current research on automatic tagging due to its universal guidelines for morphosyntactic labels. For the Portuguese language, there are still few works based on this formalism, especially when it comes to user-generated content (UGC). Therefore, the objective of this study was to analyze, refine, and evaluate a set of post-editing tagging rules proposed by Ceregatto (2022), based on errors made by the UDPipe 2.1 model when annotating the DANTEStocks corpus, which comprises a collection of financial market tweets. These rules aim to enrich tagging methods (statistical and/or probabilistic, such as UDPipe 2.1) with linguistic knowledge for UGC texts in the Portuguese language. As a result, the reduction of the initial set of rules, the formalization of their description, and the evaluation of refined rules referring to the ADJ tag are highlighted.

Descrição

Citação

RIBEIRO, Lucas Lopes. Análise de regras linguísticas para o aperfeiçoamento de anotações automáticas de part-of-speech. 2023. Trabalho de Conclusão de Curso (Graduação em Linguística) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20375.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil