Algoritmos genéticos multiobjetivo para classificação hierárquica de elementos transponíveis
Pereira, Gean Trindade
MetadataShow full item record
In Machine Learning (ML), commonly a classification problem consists of associating an instance to only one class within an usually small number of classes. However, there are more complex problems involving dozens and even hundreds of classes arranged in a hierarchical structure, which are known in the literature as Hierarchical Classification (HC) problems. In these problems, an instance is assigned not only to one class but also to its superclasses, and two approaches called Global and Local are often used in HC. In the Local Approach, multiple classifiers are trained using local information from classes, while in the Global Approach a single classifier is induced to deal with the entire class hierarchy, which makes it more interpretable. One of the fields of greatest application of HC is Bioinformatics, where tools that explore hierarchical relationships in data and/or made use of ML are still scarce. In addition, an issue that is important for both Bioinformatics and HC fields, and which is not always given due attention, is the interpretability of the models. In the context of Bioinformatics, a topic that is gaining attention is the study and classification of Transposable Elements (TEs), which are DNA fragments capable of moving inside the genome of their hosts. According to recent research, TEs are responsible for mutations in several organisms, including the human genome, which guaranteed them the nickname of great responsible for the genetic variability of species. In this work, three global methods based on Genetic Algorithms that evolve classification rules applied to HC of TEs were proposed and investigated, and in two of them, Multi-Objective Approaches were implemented in order to better deal with the predictive performance and interpretability objectives. The first one is a traditional optimization method called Hierarchical Classification with a Genetic Algorithm (HC-GA), which was used as the basis for the development of the others. The second method is called Hierarchical Classification with a Weighted Genetic Algorithm (HC-WGA), and implements the Weighted Sum Approach. The third method is called Hierarchical Classification with a Lexicographic Genetic Algorithm (HC-LGA), which follows the Lexicographic Approach. Experiments with the TEs class taxonomy developed by Wicker et al. (2007), have shown that the proposed methods have achieved better or competitive results with the state-of-the-art HC methods from the literature, with the advantage of generating interpretable rule models. When compared to the popular global method Clus-HMC, the proposed methods presented better predictive performance in addition to producing less rules with a fewer tests. In the comparisons with the homology tools BLASTn and RepeatMasker, the hierarchical methods achieved superior results in both datasets and were able to classify all the instances, different from what occurred in those tools. Moreover, it was verified that the two multi-objective methods not only obtained the best results for the two datasets used but they also surpassed the simple optimization method with statistical significance.