Análise multivariada de conteúdos da cultura científica via ciência de dados sociais e aprendizado de máquina RAG-LLM (geração aumentada por recuperação para modelos linguísticos de grande dimensão)
Carregando...
Data
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
Guided by interdisciplinary epistemological approaches from the field of Science, Technology, and Society (CTS), we investigated communication dynamics in two projects at a public university, using a hybrid approach that combines Content Analysis with Machine Learning and Social Data Science. We collected and analyzed collections from websites and digital media, including interactions in chats, comments, and event broadcasts. Our research demonstrated the effectiveness of scientific culture dynamics, the public reach of the projects, and revealed user idiosyncrasies through the correlation of quantitative and qualitative variables. The data was collected and mined using statistical and Natural Language Processing (NLP) techniques in MAXQDA software, and complemented by multivariate analyses. To analyze audiovisual and textual content from streaming channels, we developed machine learning pipelines with RAGs-LLMs (Retrieval-Augmented Generation for Large Language Models). The results of the thesis are presented in visual and analytical dashboards on the website tecnociencia.net.br, where we also implemented a conversational robot (Tesebot), trained with the research content to answer questions. The conclusion of the thesis shows that its corroboration is inevitable and loaded with statistical and inferential certainties, which reflect and legitimize the complexity of research in the era of post-academic science, big tech, social networks, and artificial intelligence. The methodological strategies made it possible to understand all the variables and statistical and relational instances involved in the dynamic observation of scientific culture in the dissemination projects researched.
Descrição
Citação
DE MOURA, Renato Aparecido Terezan. Análise multivariada de conteúdos da cultura científica via ciência de dados sociais e aprendizado de máquina RAG-LLM (geração aumentada por recuperação para modelos linguísticos de grande dimensão). 2025. Tese (Doutorado em Ciência, Tecnologia e Sociedade) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23005.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil
