Data bubbles para algoritmos de fluxo de dados hierárquicos baseados em densidade
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
Clustering data in Data Streams (DS) presents challenges due to the volume, velocity, and evolving nature of information. Hierarchical density-based clustering algorithms, such as HASTREAM, are promising for identifying arbitrarily shaped clusters, but their effectiveness critically depends on the quality of online data summarization. The standard approach, using Micro-Clusters (MCs), tends to lose information about the internal data distribution, distorting density estimates and negatively impacting the quality of the final clustering. This work proposes a novel summarization approach for hierarchical density-based clustering algorithms in DS, utilizing adapted Data Bubbles (DBs). DBs, which originally include internal density estimates , have been modified to operate under the damped window model, with new formulations for their properties based on weighted statistics. Furthermore, we propose a modification to the core distance calculation for DBs, aiming for better alignment with HDBSCAN* principles. The methodology involves integrating these adapted DBs and their revised distance metrics (modified Core Distance and Mutual Reachability Distance) into the HASTREAM algorithm, resulting in HASTREAM-DB. We describe the online maintenance processes for the adapted DBs (including the management of potential and outlier bubbles) and the subsequent offline phase of constructing the cluster hierarchy upon these structures. The objective is to demonstrate that the use of adapted Data Bubbles, by better preserving local spatial and density information, can mitigate the distortions inherent in MC-based summarization, leading to more accurate and robust cluster identification in Data Stream environments.
Descrição
Citação
NUNES, Bruno Leonel. Data bubbles para algoritmos de fluxo de dados hierárquicos baseados em densidade. 2025. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23573.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil
