Análise sobre o fator temporal em tarefas de quantificação com dados textuais
Carregando...
Data
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
The quantification task, a recently discovered field in machine learning, estimates the
class distribution of a dataset. Usually, quantification tasks are solved through classifica-
tion, an inducted classifier predicts each instance on the set and then counts how many
were labeled for each class - this approach is also known as Classify and Count. However,
the Classify and Count approach shows poor results as soon as the class distribution of
the test set differs from the class distribution of the training set. Thus, specific algorithms
and models have been proposed to solve quantification problems accurately.
It is really common to analyze big data through time. In text domains, as the Twitter
platform, which have a large set of unstructured data being generate at every instant, it
is challenging to extract useful and summarized information at the same time. Besides,
text domains show specific characteristics that increase the complexity of how those infor-
mation are extracted. A popular analysis is to discovery trending topics or how people’s
opinion about a specific topic. To do this, it is possible to use quantification methods to
categorize and consequently summarize a massive number of texts.
The proposal of this work is to make an analysis about textual quantification pro-
blems distributed over time. More precisely, this work intent to evaluate how time affects
the perfomance of quantification models. Three different approaches were evaluated to
understand the impact of time: training only once the quantification model; update the
model periodically, thus decreasing its time lag; and a forecasting approach, using regres-
sion models. This research presents some intereseting conclusions which show that there
are some peculiarities in these evaluated datasets and that state-of-the-art models may
not present the best performances as expected.
Descrição
Palavras-chave
Citação
UENO, Caio Luiggy Riyoichi Sawada. Análise sobre o fator temporal em tarefas de quantificação com dados textuais. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/19702.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NoDerivs 3.0 Brazil
