Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

In this study, the argument structure (A-structure) of predicative nouns (Npred) occurring in the financial market tweet corpus DANTEStocks was described, given the preference for using this type of predicator in digital genres of this domain. The specific objective was to verify: (i) the presence/absence of arguments (Arg) in the tweets, (ii) the syntactic realization of Args, and (iii) the influence of linguistic phenomena in tweets on the realization of the A-structure of the nouns. Specifically, 145 Npreds and 1,756 instances (tweets with at least one Npred) from the corpus were described at the syntactic-semantic level. Syntactically, semi-automatic annotation of the entire DANTEStocks was carried out according to the Universal Dependencies (UD) model. At the semantic level, syntactic dependency trees guided the manual annotation of instances according to NomBank. The syntactic-semantic mapping revealed that: (i) the A-structure of valency one (V1) Npreds is always filled in syntax, (ii) the A-structure of Npreds with V>1 shows some missing Args, (iii) most analyzed Npreds are of V3, with only 2 Args in most instances, (iv) the deprels most frequently connecting Npreds to their Args are nmod and amod, and (v) the syntactic realization of the A-structure in 24 instances was reduced by tweet-specific phenomena (truncation and juxtaposition of elements). These results enrich the descriptive framework of lexical aspects of the language in financial market tweets. Moreover, the syntactic-semantic valency description of Npreds was systematized in NounBank.DS, an online lexical repository that can support further linguistic-computational research. A contribution to Natural Language Processing (NLP) is the UD-syntactic annotation of DANTEStocks, which led to the creation of the first Portuguese tweebank. This resource enabled the development of the first UD-parser of UGC for this language. The NomBank-like semantic annotation of a portion of the corpus also generated a significant resource. Thus, this study produced reference linguistic resources and a tool (parser) for the automatic processing of Portuguese tweets, which are essential for developing NLP applications targeting this type of UGC.

Descrição

Citação

BARBOSA, Bryan Khelven da Silva. Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. 2024. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20503.

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil