Classificação hierárquica multirrótulo de funções de proteínas via predição de interações
Abstract
Proteins are macro-molecules responsible for virtually every task necessary for the maintenance of cells, having a fundamental role in the behavior and regulation of organisms. Advances in the area of Molecular Biology have allowed an almost complete listing of the proteins that make up the organisms. However, there are a large number of proteins whose function is still unknown, opening space for a new research focus in Molecular Biology. Usually, protein function prediction is performed using homology-based Bioinformatic tools, comparing a sequence with a database with many sequences belonging to previously known functions. This is a limited strategy, since it ignores the sequences' biochemical properties, and also the hierarchical relationships that may exist between the different classes. In the literature, the use of Machine Learning for the protein function prediction has shown to be promising, obtaining significant advances regarding the use of homology and other methods. Making use of Machine Learning, it is possible to model the protein function prediction problem as a Hierarchical Multi-label Classification (HMC) problem, due to the fact that protein functions are hierarchically organized and that they can occur simultaneously. This project proposes modeling the protein function prediction task as a Hierarchical Multi-label Classification problem through interaction data. Interaction data are characterized by two sets of objects, each described by their own set of features, which makes it possible to predict the interactions between two instances. In particular, we adapt the "Predictive Bi-Clustering Tree" (PBCT) method to HMC tasks. Our experiments demonstrate that PBCT-HMC is competitive to the state-of-art competitor.
Collections
The following license files are associated with this item: