Estratégias de reamostragem aplicadas em problemas de classificação binária de dados financeiros
Visualizar/ Abrir
Data
2023-08-22Autor
Sturaro, Rafael Setti Riedel
Metadata
Mostrar registro completoResumo
Many binary classification problems exhibit unbalanced classes, which consists of one of the two classes being significantly more numerous than the other. The minority class is the least representative, with far fewer individuals or objects than the majority class. In the context of financial data analysis, databases may be unbalanced. For example, in credit analysis, databases generally contain more observations referring to non-defaulting customers (majority class) than defaulting customers (minority class). Similarly, in fraud detection, databases generally contain more information on legitimate transactions (majority class) than on fraudulent transactions (minority class). This imbalance leads to a classification bias, since learning algorithms tend to classify observations from the majority group better. In this sense, this work aims to address resampling strategies (including subsampling, oversampling and hybrid methods) to classify, using logistic regression, customers applying for credit into defaulters or non-defaulters and transactions into legitimate or fraudulent.We intend to study comparatively the performance of logistic regression in classifying new instances in the following scenarios: (i) balanced training set using subsampling techniques; (ii) balanced training set using oversampling techniques; and (iii) set of balanced training using hybrid resampling techniques. We will conduct this comparative study in two contexts: (i) using all the variables in the data set; and (ii) selecting variables from the maximum likelihood estimates of the logistic regression coefficients with l1 regularisation.
Collections
Os arquivos de licença a seguir estão associados a este item: