Mineração de texto, inteligência artificial e aplicações em biotecnologia

Reis, Vítor Eulálio

Mineração de texto, inteligência artificial e aplicações em biotecnologia

Arquivos

item.page.bitstreams.primary Monografia_Vitor_Eulalio_Reis FINAL.pdf (3.41 MB)

Data

2024-04-24

Autores

Reis, Vítor Eulálio

Editor

Universidade Federal de São Carlos

Resumo

Large Language Models (LLMs) are AI-powered programs that generate and manipulate text, learning to understand and respond in a human-like way by developing their descriptive capabilities and potential. LLMs face challenges such as specific knowledge gaps, factuality issues, and hallucinations. Therefore, Retrieval-Augmented Generation (RAG) aims to address these challenges by connecting models to external and verified knowledge sources. In this context, the RAG approach is used here to analyze and extract responses from scientific texts, investigating the intrinsic relationship within the complex connections and the relationship with current use. The methods and techniques are based on AI for different language processing to retrieve, rationalize, and mobilize the knowledge present in an interdisciplinary technical-scientific knowledge to be validated. Therefore, techniques for text mining generated from relevant documents are integrated, while text generation seems to operate more effectively in complex contexts for specific in-depth analysis. Additionally, the field of Science or Science-Bot studies potential connections and interactions that are imagined or underrepresented in the literature, contributing to the ongoing debate in various ways. As a case study, Sci-Bot was fed scientific texts with existing information on possible directions for future research, public health strategies in the area control and prevention, and even in changing cultural traits. Thus, the RAG approach makes it possible to articulate textual information from the scientific literature, enabling a detailed explanation on an unprecedented scale of all the complex connections or intersections between the key concepts involved in the recent context of the pandemic.
Großsprachmodelle (LLMs) sind KI-gestützte Programme, die Texte generieren und bearbeiten. Sie lernen, menschenähnlich zu verstehen und zu reagieren, indem sie ihre sprachlichen Fähigkeiten kontinuierlich weiterentwickeln. Dabei stoßen LLMs jedoch auf Herausforderungen wie Wissenslücken, Probleme mit Faktizität und sogenannte Halluzinationen. Die Retrieval-Augmented Generation (RAG) adressiert diese Probleme, indem sie Modelle mit externen und verifizierten Wissensquellen verknüpft. In diesem Zusammenhang wird der RAG-Ansatz genutzt, um wissenschaftliche Texte zu analysieren und daraus präzise Antworten zu extrahieren. Dabei werden sowohl die inneren Zusammenhänge innerhalb komplexer Verknüpfungen als auch deren Bezug zur aktuellen Nutzung untersucht. Die angewandten Methoden und Techniken basieren auf künstlicher Intelligenz und unterstützen verschiedene Prozesse der Sprachverarbeitung. Ziel ist es, interdisziplinäres technisch-wissenschaftliches Wissen abzurufen, zu strukturieren und nutzbar zu machen. Hierzu werden Text-Mining-Techniken verwendet, um relevante Informationen aus Dokumenten zu extrahieren und die Textgenerierung auch in komplexen Kontexten effizient zu gestalten. Darüber hinaus untersucht das Forschungsfeld der Wissenschafts-Bots mögliche Verbindungen und Interaktionen, die in der Literatur bisher kaum beachtet oder nur theoretisch behandelt wurden. Dies trägt zur laufenden wissenschaftlichen Debatte bei. Als Fallstudie wurde Sci-Bot mit wissenschaftlichen Texten gefüttert, die bestehende Informationen zu potenziellen Forschungsperspektiven, Strategien im Bereich der öffentlichen Gesundheit sowie zu kulturellen Veränderungen enthalten. Der RAG-Ansatz ermöglicht es so, wissenschaftliche Informationen umfassend zu artikulieren und die komplexen Verbindungen zwischen Schlüsselkonzepten im aktuellen Pandemiekontext detailliert darzustellen.

Palavras-chave

Vitamina D, Covid-19, Análise de Proteínas (RNA_M), Inteligência Artificial, LLM, Sci-Bot, Vitamin D, Protein Analysis (RNA_M), Artificial Intelligence, Proteinanalyse (RNA_M), Künstliche Intelligenz

Citação

REIS, Vítor Eulálio. Mineração de texto, inteligência artificial e aplicações em biotecnologia. 2024. Dissertação (Mestrado em Biotecnologia) – Universidade Federal de São Carlos, Campus São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/21572.

URI

https://hdl.handle.net/20.500.14289/21572

Coleções

Teses e Dissertações

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil

Página do item completo

Mineração de texto, inteligência artificial e aplicações em biotecnologia

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons