Método escalável para aproximar uma MST utilizando um grafo de k vizinhos mais próximos
Resumo
Finding the Minimum Spanning Tree (MST) consists of inducing a spanning tree whose
sum of edge costs is minimal. Among many applications, the MST is widely used in
unsupervised machine learning algorithms, such as clustering. With increasing volume,
data used for analysis and decision making, platforms such as Apache Spark have become
more influential. However, there is a difficulty in finding the MST in a distributed envi-
ronment when the point-to-point distance needs to be calculated, since this calculation
has a quadratic complexity with respect to the number of data points, making it very
inefficient and even unfeasible for large data sets. In this work an efficient method to
find a good approximation of the MST is presented. The method consists of applying the
Boruvka algorithm in a distributed way over an approximated k nearest neighbors graph
constructed through a modified version of the NNDescent algorithm on the Apache Spark
platform. In all cases of experiments, the approximate MST reached less than 2% of the
cost higher than the cost of the exact MST when the parameter k of the NNDescent
algorithm was greater than just 1% of the number of points in the data sets. Furthermore,
the method has showed computational complexity linearly proportional to the number of
points and is slightly sublinear in relation to parameter k.
Collections
Os arquivos de licença a seguir estão associados a este item: