Estratégias de reamostragem aplicadas em problemas de classificação binária de dados financeiros

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

Many binary classification problems exhibit unbalanced classes, which consists of one of the two classes being significantly more numerous than the other. The minority class is the least representative, with far fewer individuals or objects than the majority class. In the context of financial data analysis, databases may be unbalanced. For example, in credit analysis, databases generally contain more observations referring to non-defaulting customers (majority class) than defaulting customers (minority class). Similarly, in fraud detection, databases generally contain more information on legitimate transactions (majority class) than on fraudulent transactions (minority class). This imbalance leads to a classification bias, since learning algorithms tend to classify observations from the majority group better. In this sense, this work aims to address resampling strategies (including subsampling, oversampling and hybrid methods) to classify, using logistic regression, customers applying for credit into defaulters or non-defaulters and transactions into legitimate or fraudulent.We intend to study comparatively the performance of logistic regression in classifying new instances in the following scenarios: (i) balanced training set using subsampling techniques; (ii) balanced training set using oversampling techniques; and (iii) set of balanced training using hybrid resampling techniques. We will conduct this comparative study in two contexts: (i) using all the variables in the data set; and (ii) selecting variables from the maximum likelihood estimates of the logistic regression coefficients with l1 regularisation.

Descrição

Citação

STURARO, Rafael Setti Riedel. Estratégias de reamostragem aplicadas em problemas de classificação binária de dados financeiros. 2023. Trabalho de Conclusão de Curso (Graduação em Estatística) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/18745.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil