Prevendo a popularidade de um post no Instagram via métodos de Machine Learning
Abstract
This work was motivated by the current technological scenario in which we live, where people are increasingly connected on social networks, generating a vast amount of data daily. If used appropriately, these data can provide valuable insights in the digital market. By combining this source of information with modern methods of artificial intelligence, more specifically, machine learning algorithms, we were able to study the main characteristics that drive the popularity of a post on Instagram.
In our research, we used a real database with profile information from 1887 Instagram users. Initially, we explored the data through statistical methods to understand its disposition and composition. Then, we selected the variables of interest and greater relevance to predict the popularity of a publication, using techniques such as feature importance through the LightGBM algorithm on the training set. Subsequently, we normalized the selected continuous variables and modeled the data using both the LightGBM algorithm and the Multilayer Perceptron neural network.
As a final result, we concluded that the indicators developed during the feature engineering process, along with the variables from the original database such as the number of views, comments, followers, and the variables generated in Natural Language Processing (NLP) such as the sentiment of the caption, proved to be the most relevant for predicting the popularity of a post on Instagram. Among all the fitting tests performed, the model that demonstrated the best performance in handling the information was LightGBM, with optimized parameters, achieving an RMSE of 0.04 and an R2 of 0.99.
Collections
The following license files are associated with this item: