Classificação binária de dados financeiros em problemas com classes desbalanceadas

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

In order to mitigate the risks and uncertainties associated with credit granting, financial institutions are constantly exploring methods to enhance the credit evaluation system. In the same context, the growth in credit card transactions has led to an increase in fraud, resulting in billions of dollars in annual losses for financial institutions. Therefore, it is crucial for companies to effectively detect fraudulent transactions. One way to minimize losses due to default or fraud is to use statistical methods that yield results close to reality, presenting a low margin of error. However, the major challenge in executing this process is that such financial data is imbalanced, meaning there is a higher proportion of non-defaulting customers and legitimate transactions (majority groups) than delinquent customers and fraudulent transactions (minority groups). This imbalance leads to a classification bias, as learning algorithms tend to classify observations from the majority group better. In this context, this work aims to conduct a comparative study of the performance of support vector machines and logistic regression in classifying new sample units. This study will be carried out using three financial datasets with different degrees of imbalance, considering three contexts: (i) without applying any technique to handle the imbalance of the datasets; (ii) applying data preprocessing techniques to handle the imbalance of the datasets; and (iii) using the cost-sensitive version of the original classifiers to handle the imbalance of the datasets. The analysis of classifier performance will be based on measures derived from the confusion matrix that have been shown to be less sensitive to data imbalance, such as the G-mean, Matthews correlation coefficient, and F-score.

Descrição

Citação

MOURA, Lucas Fernando. Classificação binária de dados financeiros em problemas com classes desbalanceadas. 2024. Trabalho de Conclusão de Curso (Graduação em Estatística) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/19504.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil