Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Souza, Jacqueline Aparecida de

Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Arquivos

3377.pdf (3.38 MB)

Data

2010-02-26

Autores

Souza, Jacqueline Aparecida de

Editor

Universidade Federal de São Carlos

Resumo

Based on methodological postulates of the Linguistic of corpus and on the genre concepts, proposed by Swales (1990) and Biber (1995), this research intends to describe linguistic traces which are characteristic of historic texts and correlate them to their respective genres, as well as propose a typology of traces so that it is possible to automatically identify the genre. In order to execute the research, the corpus of the Portuguese of the centuries XVI, XVII and XVII of the project Historical Dictionary of the Portuguese in Brazil (program Institutes of the Millennium/CNPq UNESP/Araraquara), which is constituted by 2,459 texts and 7,5 million words has been used. In order to realize a historical description, the study has started from synchronic characteristics obtained from the table of contemporary traces elaborated by Aires (2005). As for the manipulation of the corpus, it has been used the Philologic, the Unitex as well as another tool for the extraction and quantification of traces that has been developed. For the purposes of classification, algorithms available at Weka (Waikato Environment for knowledge Analysis) such as: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree have been used. The description has been made based on the 62 traces, which include statistics based on a text as a whole and on words, including classes of verbs, pronouns, adverbs as well as discourse markers, expressions and lexical units. It has been concluded that the genres share specific linguistic characteristics. However, they also present their own standards with the use of specific expressions and the frequency of lexical units. Despite the limitations and complications in using a historical corpus, the performance of the classifiers based on the raised traces was satisfactory and the rate of correct classification was 84% and 92%.

Palavras-chave

Linguística, Linguística de corpus, Aprendizado de computador, Corpus histórico, Traços lingüísticos, Gêneros textuais, Classificação automática, Corpus linguistics, Features, Textual genre, Automatic classification

Citação

SOUZA, Jacqueline Aparecida de. Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais. 2010. 167 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.

URI

https://repositorio.ufscar.br/handle/20.500.14289/5698

Coleções

Teses e Dissertações

Página do item completo

Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced