Caracterização de desvios sintáticos em redações de estudantes do ensino médio: subsídios para o processamento automático das línguas naturais

Ramisch, Renata

Caracterização de desvios sintáticos em redações de estudantes do ensino médio: subsídios para o processamento automático das línguas naturais

Arquivos

renata-ramisch_dissertacao_vfinal.pdf (2.79 MB)

carta-orientador-renata_ASSINADA.pdf (112.67 KB)

Data

2020-03-27

Autores

Ramisch, Renata

Editor

Universidade Federal de São Carlos

Resumo

Writing essays is a common task for students during school education, and a good performance in this task guarantees better grades to compete for places in the best universities. However, deviations from the standard written Portuguese are quite frequent, ranging from spelling and grammar to textual and discursive structure. This research specifically investigated the recurrence of syntactic errors and their possible correlations with certain linguistic attributes of the sentences. For this purpose, we built a corpus of 1,045 essays following ENEM specifications, that were written by high school students and segmented into a subcorpus of 10,652 sentences. This subcorpus was again segmented into train corpus (8,654 sentences) and test corpus (1,998 sentences). We established a manual annotation scheme in two phases: classification of sentences in containing or not syntactic errors, and categorization of the errors in 2,500 sentences based on a typology of 11 categories and 27 subcategories. The annotation showed that 73.34% of the annotated sentences contain syntactic errors (6,347 sentences from train corpus and 1,425 from test corpus), and the rest of the sentences do not contain syntactic errors (2,307 sentences from train corpus and 573 sentences from the test corpus). The most frequent categories among the 7,290 errors are those of punctuation (44%) and agreement (18.9%). We also carried out an extensive qualitative linguistic analysis of the phenomena in which the errors occur. This analysis looked at specific syntactic phenomena such as inversions of the canonical word order, coordination, subordination, etc., and at the phenomena that stem from further linguistic levels, such as missing accents, light-verb constructions and the use of specific verbs. In addition, the corpus was automatically annotated with the parser UDPipe, and we extracted from its output 17 linguistic features, which we correlated with the presence of errors via Supervised Machine Learning, using the software Weka. We obtained the best result in the test corpus with the algorithm Logistic Regression (75.62% accuracy). The features that were most strongly correlated with the presence of errors, indicated by feature engineering algorithms, were the sentence size and the depth of the syntactic tree. As an additional result, we built a computational-linguistic resource that can be useful to Natural Language Processing systems. The potential goal of such partnership is the development of writing assistance tools that can facilitate the process of identifying and correcting errors made by the authors of the essays themselves.

Palavras-chave

Redação escolar, Desvio sintático, Students essays, Syntactic errors, Processamento automático das línguas naturais, Natural language processing

Citação

RAMISCH, Renata. Caracterização de desvios sintáticos em redações de estudantes do ensino médio: subsídios para o processamento automático das línguas naturais. 2020. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12691.

URI

https://repositorio.ufscar.br/handle/20.500.14289/12691

Coleções

Teses e Dissertações

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil

Página do item completo

Caracterização de desvios sintáticos em redações de estudantes do ensino médio: subsídios para o processamento automático das línguas naturais

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons