Eficientes modelos de múltiplas densidades para o aprendizado em grandes conjuntos e fluxo de dados
Carregando...
Data
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
Unsupervised and semi-supervised machine learning prove highly beneficial in data-intensive contexts. Density-based hierarchical clustering offers a comprehensive insight into cluster and outlier structures within datasets through density functions. These algorithms establish a hierarchy derived from a minimal spanning tree, where the edges represent the maximum density required for connected data to delineate clusters, contingent upon a minimum object count, denoted as MinPts, within a given neighborhood. CORE-SG, an advanced spanning graph, efficiently generates multiple hierarchical solutions with varying densities, surpassing its predecessors in computational performance. Nonetheless, density-based algorithms necessitate pairwise similarity calculations, resulting in an asymptotic complexity of O(n^2) for datasets containing n objects, rendering them impractical for scenarios involving extensive data volumes. This study introduces hierarchical machine learning models based on density, aiming to alleviate computational costs with the help of Data Bubbles, focusing on clustering and outlier detection. It examines the impact of data summarization on the quality of unsupervised models with multiple densities and the gain in computational performance. The research ensures scalability across various machine learning methods grounded in these models, facilitating the handling of massive data volumes without a significant loss in the resulting quality. We propose the application of the multiple hierarchies method with Data Bubbles in density-based clustering models in data streams. We provide scalability for several machine learning methods based on these models where data is infinite and needs to be processed in real time, in addition to generating better clustering quality.
Descrição
Citação
BATISTA, Natanael Fabrício Dacioli. Eficientes modelos de múltiplas densidades para o aprendizado em grandes conjuntos e fluxo de dados. 2024. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20102.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil
