Modelagem de eventos raros: um estudo comparativo
Scacabarozi, Fernanda Nanci
MetadataMostrar registro completo
In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts.