Descrição da fragmentação estrutural de tweets do mercado financeiro via relação de “parataxis” do modelo “Universal Dependencies”
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
One of the characteristics of tweets (now X posts) is fragmentation, as many of them are made up of sequences of segments with no explicit syntactic connection between them. These segments can be sentences, short syntagms or even fragments of these that are just juxtaposed. When a corpus is grammatically annotated according to the Universal Dependencies (UD) model, the structural fragmentation of utterances is captured by the parataxis dependency relation (deprel). In general terms, this deprel is used to note that two juxtaposed segments of the same utterance have no clear syntactic relationship between them, and can be specified by sub-tags. In this work, we described the fragmentation of financial market tweets in the DANTEStocks corpus via deprel. To this end, the occurrences of parataxis, with and withoutsub-labels, were extracted from the corpus and organized semi-automatically according to simple frequency. Of the total of 6,733 occurrences, 3,840 cases (57%) have no sub-relation and 2,893 cases (43%) have different sub-relations. Next, all occurrences of parataxis, with and without sub-relationships, were automatically organized according to the frequency of combinations between the morphosyntactic(or part-of-speech - PoS) labels of the head and dependent and the order of the labels. With the exception of parataxis:strunc and parataxis:wtrunc, the other types with sub-relation, i.e. parataxis:hashtag, parataxis:cashtag and parataxis:url, have lexical dependents, as they are equivalent to just one word/token. The dependents of parataxis:hashtag and parataxis:cashtag are X or PROPN and of parataxis:url, SYM. The cases of parataxis:strunc and parataxis:wtrunc can have lexical or structural dependents (syntagms or truncated sentences). The order of the head and dependent in parataxis:hashtag and parataxis:cashtag varies, while in parataxis:url, parataxis:strunc and parataxis:wtrunc it is always from left to right. Regarding parataxis without sub-relation, the most frequent combinations of PoS are VERB (root) + VERB/NOUN/PROPN and NOUN (root) + VERB/NOUN/NOUN, indicating that the fragmentation of tweets commonly involves juxtaposed verbal and nominal segments. With this, this work contributes to increasing the body of knowledge about the linguistic characteristics of DANTEStocks, which is the first corpus composed of tweets with UD annotation in Portuguese, and has already allowed the training and evaluation of some NLP tools for UCG.
Descrição
Palavras-chave
Citação
FREITAS, Isabela Santos de. Descrição da fragmentação estrutural de tweets do mercado financeiro via relação de “parataxis” do modelo “Universal Dependencies”. 2025. Trabalho de Conclusão de Curso (Graduação em Linguística) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/22599.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil
