Modelo de classificação para dados desbalanceados: método SMOTE e variantes

Nora, Andrielle Couto

Visualizar/Abrir

Relatório de pesquisa (1.025Mb)

Data

2024-01-29

Autor

Nora, Andrielle Couto

Metadata

Mostrar registro completo

Resumo

Often, in classification models, we encounter databases that have highly imbalanced classes, such as: rare disease diagnostic data, manufacturing defects, fraudulent transactions, etc. Training a model on a dataset with few observations of a particular class results in poor predictive performance, especially for observations belonging to the minority class. In this Undergraduate Thesis, we present and compare different variants of the Synthetic Minority Over-sampling TEchnique (SMOTE) method for oversampling imbalanced data used in classification models, specifically Logistic Regression, in order to demonstrate how these techniques can improve the ability to identify and predict observations from the minority class in realistic and imbalanced scenarios, as well as to determine which combination of sampling technique and Logistic Regression classification model leads to better performance.

URI

https://repositorio.ufscar.br/handle/ufscar/19545

Collections

Os arquivos de licença a seguir estão associados a este item:

Creative Commons

Exceto quando indicado o contrário, a licença deste item é descrito como Attribution-NonCommercial-NoDerivs 3.0 Brazil