Abstract
For real-world data to be explored, pre-processing is needed in order to ease machine
learning applications. Nowadays, various pre-processing techniques are available, and this paper
aims to show the impacts of using clustering as one of them, more specifically in linear
regression problems. Two experiments were carried out, using two different databases: the
first one describing data from a series of properties from Ames, a city from Iowa, in the
United States of America, and the second one containing information about COVID-19. It was
observed that in both cases, clustering before applying the regression model improves
regressor performance, based on database nature. Besides the improvement, it is
recommended to use clustering alongside other pre-processing techniques.