Leitura da web em português em ambiente de aprendizado sem-fim
Abstract
NELL is a computer system that has the goal of learn to learn 24 hours per day, continuously
and learn more an better than the last day, to perform the knowledge base (KB). NELL is
running since January 12 of 2010. Furthermore, NELL goals is have hight precision to be able to continue the learning.
NELL is developed in macro-reading context, because this NELL needs very much redundancy to run. The first step to run NELL is to have an big (all-pairs-data). An all-pairs-data
is a preprocessed base using Natural Language Processing (NLP), that base has all sufficient statistics about a corpus of web pages. The proposal of this project was to create a instance of NELL (currently in English) in Portuguese. For this, the first goal was the developing an all-pairs-data in Portuguese. The
second step was to create a new version of Portuguese NELL. And finally, the third goal
was to develop a coreference resolution hybrid method focused in features semantics and
morphologics. This method is not dependent of a specific language, it is can be applied for
another languages with the same alphabet of Portuguese language. The NELL in Portuguese was developed, but the all-pairs-data is not big enough. Because it Portuguese NELL is not running for ever, like the English version. Even so, this project present the steps about how to develop a NELL in other language and some ideas about how to improve the all-pairs-data. By the way, this project present a coreference resolution hybrid method with good results to NELL.