Avaliação do impacto da seleção de partições base em ensemble multiobjetivo
Abstract
Unsupervised data clustering is not a trivial process, as no previous knowledge is available
and real data is often complex and multi-faceted. To make matters worse, traditionally,
clustering aims to describe the data being explored under a single perspective. However, it
is broadly known that in several cases this approach imposes serious limitations on what
could be extracted with the analysis. Furthermore, changes in parameters and preprocessing
techniques can dramatically change the final result, either by evidencing or by hiding a
possible plural meaning presented in the data. To tackle some of these issues, recent efforts
that build knowledge considering multiple partitions as base, such as ensemble clustering,
emerged. However, special care must be taken in the composition of those partitions, as
their quality and diversity proved to be closely related to their performances. To enhance
the quality and diversity of those multiple partitions — and provide better results —, a
number of methods to evaluate and select a subset of the partitions have been proposed
and successfully applied. In this work, we expand this discussion by evaluating the impact
of some of the state-of-the-art selection methods in the novel context of multi-objective
cluster ensemble. In this novel context, our analysis show improvements in two important
issues: (i) the results are more concise, which facilitates posterior manual analysis, and
(ii) are obtained with less computational effort. All of that without affecting the quality
of the results.