Detecção de postagens com informações falsas sobre a pandemia do Covid-19 na rede social Instagram

Cabral, Mateus Oliveira

dc.contributor.author	Cabral, Mateus Oliveira
dc.date.accessioned	2021-11-08T10:44:31Z
dc.date.available	2021-11-08T10:44:31Z
dc.date.issued	2021-07-05
dc.identifier.citation	CABRAL, Mateus Oliveira. Detecção de postagens com informações falsas sobre a pandemia do Covid-19 na rede social Instagram. 2021. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2021. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15074.	*
dc.identifier.uri	https://repositorio.ufscar.br/handle/ufscar/15074
dc.description.abstract	This dissertation addresses the detection of false information on Instagram, the social network that has been growing more and more compared to other social media platforms. Because it is a social network with multimedia content (image, video and text), but with an emphasis on posting photos, there are few scientific research on the impacts of posts with false information that this network provides on society. This happens mainly in times of political elections or in historical events, when there is a great demand for information. Therefore, this Master’s research had as its domain the health area, with emphasis on the subject of the COVID-19 pandemic, a subject of extreme importance and big social impact. Many studies address various techniques for identifying fake news articles and/or fake posts on social networks such as Facebook, Twitter, Youtube and Whatsapp.Some studies focus on the content of the news, other studies focus on the social context through information from social networks that involves sentiment analysis, while for other studies the focus is on the temporal, which is also very much analyzed on the dynamics of posts on the social network. In this Master’s research, the source chosen to extract study data has a functional dynamic that is completely different from other social networks. Sharing the phenomena that impact the dispersion of news on social media does not work in the same way on Instagram. In addition, the posted images may contain text within the images, which creates the need to use Optical Character Recognition (OCR) based tools to extract the texts, and only then compare the extracted information in posts in Portuguese to classify whether it is false or true information. Another problem, in addition to the lack of research on false information related to Instagram, is the existence of few content datasets in Portuguese for analysis and benchmark of false information detection models, especially those containing images. The aim of this Master’s research was to investigate the detection of posts in Portuguese with false information about the COVID-19 pandemic on the Instagram social network. In this sense, the research resulted in the proposal of a machine learning model that allows the detection of false information. In addition, this research performed the compilation of a dataset related to COVID-19 to be made available for future investigations into fake content on the Instagram social network. The model was validated through experimental tests with real data. The results showed an accuracy between 96% and 99% in detecting posts with false information about COVID-19.	eng
dc.description.sponsorship	Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)	por
dc.language.iso	por	por
dc.publisher	Universidade Federal de São Carlos	por
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Brazil	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/br/	*
dc.subject	Conteúdo falso	por
dc.subject	Redes sociais	por
dc.subject	Instagram	eng
dc.subject	COVID-19	eng
dc.subject	Social network	eng
dc.subject	Fake news	eng
dc.title	Detecção de postagens com informações falsas sobre a pandemia do Covid-19 na rede social Instagram	por
dc.title.alternative	Detection of posts with false information about Covid-19 pandemic on Instagram social network	eng
dc.type	Dissertação	por
dc.contributor.advisor1	Ciferri, Ricardo Rodrigues
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/8382221522817502	por
dc.description.resumo	Esta dissertação aborda a detecção de informações falsas no Instagram, a rede social que vem crescendo cada vez mais em comparação com as demais plataformas de redes sociais. Por se tratar de uma rede social com conteúdo multimídia (imagem, vídeo e texto), mas com ênfase na postagem de fotos, há pouca pesquisa científica dos impactos das postagens com informações falsas que essa rede proporciona na sociedade. Isso acontece principalmente em épocas de eleições políticas ou em acontecimentos históricos, em que existe uma grande demanda sobre informações. Por isso, essa pesquisa de Mestrado teve como domínio a área da saúde com ênfase no assunto da pandemia de COVID-19, assunto de extrema importância e impacto social. Muitos estudos abordam diversas técnicas para identificação de artigos de notícias falsas e/ou postagens falsas em redes sociais como Facebook, Twitter, Youtube e Whatsapp. Alguns estudos enfocam no conteúdo da notícia, outros estudos enfocam no contexto social por meio de informações das redes sociais que envolve análise de sentimento, enquanto para outros estudos o foco é o temporal muito analisado também sobre a dinâmica das postagens na rede social. Nesta pesquisa de Mestrado, a fonte escolhida para extrair dados de estudo, tem uma dinâmica funcional completamente diferente das demais redes sociais. O compartilhamento dos fenômenos que impactam a dispersão das notícias nas redes sociais não funciona da mesma forma no Instagram. Além disso, as imagens postadas podem conter textos dentro das imagens, o que gera a necessidade de utilizar ferramentas baseadas em Optical Character Recognition (OCR) para extrair os textos, para somente depois confrontar a informação extraída em postagens em português para classificar se é uma informação falsa ou verdadeira. Outro problema, além da falta de pesquisas sobre informações falsas relacionados ao Instagram, é a existência de poucos conjuntos de dados de conteúdos em português para análises e benchmark de modelos de detecção de informações falsas, principalmente que contenham imagens. O objetivo desta pesquisa de Mestrado foi investigar a detecção de postagens em português com informações falsas sobre a pandemia de COVID-19 na rede social Instagram. Nesse sentido, a pesquisa teve como resultado a proposta de um modelo de aprendizado de máquina que permite a detecção de informações falsas. Além disso, esta pesquisa realizou a compilação de um conjunto de dados relacionadas a COVID-19 para ser disponibilizada para futuras investigações sobre conteúdos falsos na rede social Instagram. O modelo foi validado por meio de testes experimentais com dados reais. Os resultados mostraram uma acurácia entre 96% e 99% na detecção de postagens com informações falsas sobre COVID-19.	por
dc.publisher.initials	UFSCar	por
dc.publisher.program	Programa de Pós-Graduação em Ciência da Computação - PPGCC	por
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO	por
dc.description.sponsorshipId	CAPES: Código de Financiamento 001	por
dc.publisher.address	Câmpus São Carlos	por
dc.contributor.authorlattes	http://lattes.cnpq.br/8893072052840604	por