Data bubbles para algoritmos de fluxo de dados hierárquicos baseados em densidade

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

Clustering data in Data Streams (DS) presents challenges due to the volume, velocity, and evolving nature of information. Hierarchical density-based clustering algorithms, such as HASTREAM, are promising for identifying arbitrarily shaped clusters, but their effectiveness critically depends on the quality of online data summarization. The standard approach, using Micro-Clusters (MCs), tends to lose information about the internal data distribution, distorting density estimates and negatively impacting the quality of the final clustering. This work proposes a novel summarization approach for hierarchical density-based clustering algorithms in DS, utilizing adapted Data Bubbles (DBs). DBs, which originally include internal density estimates , have been modified to operate under the damped window model, with new formulations for their properties based on weighted statistics. Furthermore, we propose a modification to the core distance calculation for DBs, aiming for better alignment with HDBSCAN* principles. The methodology involves integrating these adapted DBs and their revised distance metrics (modified Core Distance and Mutual Reachability Distance) into the HASTREAM algorithm, resulting in HASTREAM-DB. We describe the online maintenance processes for the adapted DBs (including the management of potential and outlier bubbles) and the subsequent offline phase of constructing the cluster hierarchy upon these structures. The objective is to demonstrate that the use of adapted Data Bubbles, by better preserving local spatial and density information, can mitigate the distortions inherent in MC-based summarization, leading to more accurate and robust cluster identification in Data Stream environments.

Descrição

Citação

NUNES, Bruno Leonel. Data bubbles para algoritmos de fluxo de dados hierárquicos baseados em densidade. 2025. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23573.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil