Além do aprendizado local e global: particionando o espaço de classes em problemas de classificação multirrótulo
Abstract
Inducing a model capable of predicting a set of labels for an instance is the objective of multi-label classification, a supervised predictive machine learning task. Work in the literature has shown that identifying, modeling and exploring correlations between labels improves the predictive performance of multi-label classifiers. However, the traditional approaches, referred to here as global and local, used to solve multi-label classification problems may not be taking advantage of these correlations, as in both these correlations are not fully considered. In the global approach, all labels are learned at once and more specific information or correlations can be ignored, while in the local approach labels are learned individually, making correlation learning impractical. There are also works in the literature that show that the currently available multi-label datasets have a very low level of label dependence, and therefore exploring correlations is impractical, while others claim that learning the labels individually is the most compatible solution, and even works that recommend global approach methods as they generate a single, more compact model. In this work, a hybrid approach is proposed, which explores the advantages and tries to mitigate the disadvantages of traditional global and local approaches, which is called Hybrid Partitions for Multi-label Classification - \ac{HPML}. This approach aims to find several label partitions, which are composed of disjoint groups of correlated labels, here called hybrid partitions. Four experiments were conducted to test and validate the hypothesis with different versions of hybrid partitions, which were compared with the partitions generated by the global, local approach and also different random versions. In general, the experiments showed that it is possible to find a hybrid partition capable of improving the predictive performance of classifiers on various data sets and that traditional methods still fail to learn the labels as well as correctly deal with the correlations between labels.
Collections
The following license files are associated with this item: