Estratégia híbrida de seleção de partições para o problema de agrupamento de dados
Abstract
Inability to identify partitions of different sizes and shapes is a fundamental limitation of any clustering algorithm, especially when different regions of the search space contain clusters with varied characteristics. It is possible to apply diverse clustering algorithms, with different parameters, but then, it is necessary to deal with a large number of partitions. Techniques such as ensemble and multiobjective clustering treat this problem using distinct criteria, but they have high computational cost. Moreover, the ensemble technique generates a single solution, which may not represent every real partition present in the data. On the other hand, multiobjective clustering may generate a large number of partitions, which is difficult to analyze manually. In this dissertation, we propose a hybrid multiojective algorithm, HSS (Hybrid Selection Strategy), that aims to return a reduced and yet diverse set of solutions. It can be divided in three steps: (i) the application of a multiobjective algorithm to a set of base partitions for the generation of an approximation of the Pareto Front, (ii) the division of the solutions from the approximation of the Pareto Front into a certain number of regions and (iii) the selection of a solution per region, through the application of the Adjusted Rand Index. Experiments show the effectiveness of HSS in selecting a reduced number of partitions.