Método escalável para aproximar uma MST utilizando um grafo de k vizinhos mais próximos

Igor Raphael Magollo

Visualizar/Abrir

Monografia (740.8Kb)

Data

2021-06-25

Autor

Igor Raphael Magollo

Metadata

Mostrar registro completo

Resumo

Finding the Minimum Spanning Tree (MST) consists of inducing a spanning tree whose sum of edge costs is minimal. Among many applications, the MST is widely used in unsupervised machine learning algorithms, such as clustering. With increasing volume, data used for analysis and decision making, platforms such as Apache Spark have become more influential. However, there is a difficulty in finding the MST in a distributed envi- ronment when the point-to-point distance needs to be calculated, since this calculation has a quadratic complexity with respect to the number of data points, making it very inefficient and even unfeasible for large data sets. In this work an efficient method to find a good approximation of the MST is presented. The method consists of applying the Boruvka algorithm in a distributed way over an approximated k nearest neighbors graph constructed through a modified version of the NNDescent algorithm on the Apache Spark platform. In all cases of experiments, the approximate MST reached less than 2% of the cost higher than the cost of the exact MST when the parameter k of the NNDescent algorithm was greater than just 1% of the number of points in the data sets. Furthermore, the method has showed computational complexity linearly proportional to the number of points and is slightly sublinear in relation to parameter k.

URI

https://repositorio.ufscar.br/handle/ufscar/14446

Collections

Os arquivos de licença a seguir estão associados a este item:

Creative Commons

Exceto quando indicado o contrário, a licença deste item é descrito como CC0 1.0 Universal