Modelos Lomax assimétricos: uma nova abordagem para a classificação de dados binários desbalanceados
Abstract
Imbalanced data refers to a dataset where one class has significantly fewer observations than the other class. This can lead to poor performance of both machine learning algorithms and statistical models, since most of these tools assume that the data has the same proportion of observations in both categories. To deal with this challenge, several authors suggest the use of asymmetric link functions in binary regression, instead of the well-known symmetric link functions: logit and probit. Thus, it is possible not only to improve the predictive performance of the model, but also to reduce the bias in the estimation of parameters and probabilities. This is a solution that generates probabilistic models, which excel in decision-making compared to those that simply assign a single class without considering the associated probability. Therefore, this work aims to present new asymmetric link functions generated from the transformations of the Lomax distribution.
These functions include the Double Lomax (DLomax), Power Double Lomax (PDLomax), and Reverse Power Double Lomax (RPDLomax) distributions. The proposed functions have proven asymmetry and can be easily implemented in statistical softwares. In addition, the simulation study indicates that these functions can perform better than logistic regression in various imbalanced classification scenarios. They also proved to be promising in modeling real-world datasets, as in this work we obtained better results than classic link functions in two applications.
Collections
The following license files are associated with this item: