• português (Brasil)
    • English
    • español
  • English 
    • português (Brasil)
    • English
    • español
  • Login
About
  • Policies
  • Instructions to authors
  • Contact
    • Policies
    • Instructions to authors
    • Contact
View Item 
  •   Home
  • Centro de Educação e Ciências Humanas - CECH
  • Programas de Pós-Graduação
  • Linguística - PPGL
  • Teses e dissertações
  • View Item
  •   Home
  • Centro de Educação e Ciências Humanas - CECH
  • Programas de Pós-Graduação
  • Linguística - PPGL
  • Teses e dissertações
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsAdvisorTitlesSubjectsCNPq SubjectsGraduate ProgramDocument TypeThis CollectionBy Issue DateAuthorsAdvisorTitlesSubjectsCNPq SubjectsGraduate ProgramDocument Type

My Account

Login

Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Thumbnail
View/Open
3377.pdf (3.382Mb)
Date
2010-02-26
Author
Souza, Jacqueline Aparecida de
Metadata
Show full item record
Abstract
Based on methodological postulates of the Linguistic of corpus and on the genre concepts, proposed by Swales (1990) and Biber (1995), this research intends to describe linguistic traces which are characteristic of historic texts and correlate them to their respective genres, as well as propose a typology of traces so that it is possible to automatically identify the genre. In order to execute the research, the corpus of the Portuguese of the centuries XVI, XVII and XVII of the project Historical Dictionary of the Portuguese in Brazil (program Institutes of the Millennium/CNPq UNESP/Araraquara), which is constituted by 2,459 texts and 7,5 million words has been used. In order to realize a historical description, the study has started from synchronic characteristics obtained from the table of contemporary traces elaborated by Aires (2005). As for the manipulation of the corpus, it has been used the Philologic, the Unitex as well as another tool for the extraction and quantification of traces that has been developed. For the purposes of classification, algorithms available at Weka (Waikato Environment for knowledge Analysis) such as: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree have been used. The description has been made based on the 62 traces, which include statistics based on a text as a whole and on words, including classes of verbs, pronouns, adverbs as well as discourse markers, expressions and lexical units. It has been concluded that the genres share specific linguistic characteristics. However, they also present their own standards with the use of specific expressions and the frequency of lexical units. Despite the limitations and complications in using a historical corpus, the performance of the classifiers based on the raised traces was satisfactory and the rate of correct classification was 84% and 92%.
URI
https://repositorio.ufscar.br/handle/ufscar/5698
Collections
  • Teses e dissertações

UFSCar
Universidade Federal de São Carlos - UFSCar
Send Feedback

UFSCar

IBICT
 

 


UFSCar
Universidade Federal de São Carlos - UFSCar
Send Feedback

UFSCar

IBICT