Análise de sentimento multiclasse: uma abordagem com o uso de aprendizado de máquina
Resumen
In the globalized world, the analysis of data generated from the most varied sources, especially the textual ones, has become of great importance for the acquisition of knowledge and information. In this respect, the Internet and social networks make up the main textual database. The Sentiment Analysis is a form of data mining in text format, and the purpose of this type of analysis is to identify and / or analyze users' opinions about an entity or about sentiment related to various topics. Several researchers have used the Sentiment Analysis to understand user behavior through polarity, which can be separated into two or three classes. However, the challenge ahead is to find ways that go beyond the traditional classification and achieve a more real analysis of the expressed feelings, exploring t he idea of multiclass analysis (through emotional classes). Based on these facts, this paper aims to study aspects of the Sentiment Analysis related to the number of classes of emotions to be analyzed, as well as the representation form of the texts to be submitted for classification. For this, classic Machine Learning algorithms (SVM, kNN and Naive Bayes) as well as vectorization techniques such as TF - IDF and Word2Vec were used. The results show that a reduced number of classes allied to the use of Word2Vec as a textual representation method improves the classification result, especially with the use of the SVM classifier, obtaining an accuracy of 58.8% for the emotional base and 68.6% for the basis of polarity.
Colecciones
El ítem tiene asociados los siguientes ficheros de licencia: