Observações atípicas em alta dimensão
Abstract
Outliers and heteroskedastic noise are two common situations in Statistics. Nowadays the amount
of generated data is very high and for this reason it is possible to find high dimensional data
(the dimension d is just as large or larger than the number of observations n). Furthermore,
it is possible that the data have heteroskedastic noise, which means that the noise variance
can be different entrywise. Principal component analysis is a technique that aims to create
a subspace with lower dimension than the original space. The technique is used in different
areas such as Statistics, Econometrics, Machine Learning and Applied Mathematics. Choi and
Marron (2019) introduced a new notion of high dimensional outliers that embraces other types
and also investigates the behaviour of these outliers in the subspace created by the principal
components analysis. Most of the techniques used in this context are based on the assumption
of homoskedastic noise. However, as mentioned before, it is known that this is not always the
case. Therefore, Zhang, Cai and Wu (2022) proposed a new method called HeteroPCA, which
main objective is to remove the bias of the main diagonal of the sample covariance matrix due to
heteroskedasticity. In this work, the main objective is to combine the method proposed by Zhang,
Cai and Wu (2022) and the methodology proposed by Choi and Marron (2019) to find a subspace
capable of identifying the presence of outliers when heteroskedasticity noise is present.
Collections
The following license files are associated with this item: