Leitura da web em português em ambiente de aprendizado sem-fim

Duarte, Maisa Cristina

Leitura da web em português em ambiente de aprendizado sem-fim

Arquivos

TeseMCD.pdf (1.49 MB)

Data

2016-01-04

Autores

Duarte, Maisa Cristina

Editor

Universidade Federal de São Carlos

Resumo

NELL is a computer system that has the goal of learn to learn 24 hours per day, continuously and learn more an better than the last day, to perform the knowledge base (KB). NELL is running since January 12 of 2010. Furthermore, NELL goals is have hight precision to be able to continue the learning. NELL is developed in macro-reading context, because this NELL needs very much redundancy to run. The first step to run NELL is to have an big (all-pairs-data). An all-pairs-data is a preprocessed base using Natural Language Processing (NLP), that base has all sufficient statistics about a corpus of web pages. The proposal of this project was to create a instance of NELL (currently in English) in Portuguese. For this, the first goal was the developing an all-pairs-data in Portuguese. The second step was to create a new version of Portuguese NELL. And finally, the third goal was to develop a coreference resolution hybrid method focused in features semantics and morphologics. This method is not dependent of a specific language, it is can be applied for another languages with the same alphabet of Portuguese language. The NELL in Portuguese was developed, but the all-pairs-data is not big enough. Because it Portuguese NELL is not running for ever, like the English version. Even so, this project present the steps about how to develop a NELL in other language and some ideas about how to improve the all-pairs-data. By the way, this project present a coreference resolution hybrid method with good results to NELL.

Palavras-chave

Semi-Supervised Learning, Never-Ending Learning, NELL, Coupling, Correference Resolution, Read The Web in Portuguese, Aprendizado de máquina semissupevisionado, Aprendizado sem-fim, Acoplamento, Resolução de correferência

Citação

DUARTE, Maisa Cristina. Leitura da web em português em ambiente de aprendizado sem-fim. 2016. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, Campus São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8414.

URI

https://repositorio.ufscar.br/handle/20.500.14289/8414

Coleções

Teses e Dissertações

Página do item completo

Leitura da web em português em ambiente de aprendizado sem-fim

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced