Aprendizado de subcategorias para Never-ending Language Learning: uma abordagem baseada em perguntas e respostas
Souza, Wesley Willy Oliveira de
MetadataShow full item record
In recent years, ontologies have been used in information systems to index large corpora of documents or collections of facts and directly support user interaction with the system through functionalities such as navigation and searches. Both structure and content of ontologies must come with these changes, over time, without losing coherence. Expansion of ontologies is primarily an organizational process and there must be rules for the processes of updating, inserting and exclusion from the ontology. After learning millions of facts extracted from the web, NELL (Never-ending Language Learning), the first never-ending machine learning system described in the literature that continuously extracts facts (reading the web) to increase its knowledge base and learn to read better than the previous day, began to learn beyond the knowledge extracted and to infer new beliefs that it had not yet read before, becoming able to expand its initial ontology through some contributions. In this way, the present thesis proposes a sequential modular computational model that allows the expansion of the ontology of the NELL knowledge base, identifying and classifying subcategories of the categories already known by the NELL ontology. The proposed component receives as inputs question texts in English from the Yahoo Answers forum, a set of English Wikipedia articles, the NELL knowledge base and a set of seed examples. From this, preprocessing tasks were done to extract labelled and unlabeled examples, which were classified by a machine learning algorithm that define the new candidates to subcategories. A second module performs a validation procedure based on conditional probability. The results showed that the component, in addition to achieve adequate performances in terms of subcategories learning, maintains a relatively low rate of false positives.
The following license files are associated with this item: