Mineração de texto, inteligência artificial e aplicações em biotecnologia

Reis, Vítor Eulálio

Mineração de texto, inteligência artificial e aplicações em biotecnologia

dc.contributor.advisor1	Caracelli, Ignez
dc.contributor.advisor1Lattes	https://lattes.cnpq.br/8956527354576143
dc.contributor.advisor1orcid	https://orcid.org/0000-0003-4945-7485
dc.contributor.author	Reis, Vítor Eulálio
dc.contributor.authorethnicity	Pataxo
dc.contributor.authorlattes	https://lattes.cnpq.br/2456364381525767
dc.contributor.authororcid	https://orcid.org/0009-0000-0314-8869
dc.date.accessioned	2025-03-18T12:02:06Z
dc.date.issued	2024-04-24
dc.description.abstract	Large Language Models (LLMs) are AI-powered programs that generate and manipulate text, learning to understand and respond in a human-like way by developing their descriptive capabilities and potential. LLMs face challenges such as specific knowledge gaps, factuality issues, and hallucinations. Therefore, Retrieval-Augmented Generation (RAG) aims to address these challenges by connecting models to external and verified knowledge sources. In this context, the RAG approach is used here to analyze and extract responses from scientific texts, investigating the intrinsic relationship within the complex connections and the relationship with current use. The methods and techniques are based on AI for different language processing to retrieve, rationalize, and mobilize the knowledge present in an interdisciplinary technical-scientific knowledge to be validated. Therefore, techniques for text mining generated from relevant documents are integrated, while text generation seems to operate more effectively in complex contexts for specific in-depth analysis. Additionally, the field of Science or Science-Bot studies potential connections and interactions that are imagined or underrepresented in the literature, contributing to the ongoing debate in various ways. As a case study, Sci-Bot was fed scientific texts with existing information on possible directions for future research, public health strategies in the area control and prevention, and even in changing cultural traits. Thus, the RAG approach makes it possible to articulate textual information from the scientific literature, enabling a detailed explanation on an unprecedented scale of all the complex connections or intersections between the key concepts involved in the recent context of the pandemic.	eng
dc.description.abstract	Großsprachmodelle (LLMs) sind KI-gestützte Programme, die Texte generieren und bearbeiten. Sie lernen, menschenähnlich zu verstehen und zu reagieren, indem sie ihre sprachlichen Fähigkeiten kontinuierlich weiterentwickeln. Dabei stoßen LLMs jedoch auf Herausforderungen wie Wissenslücken, Probleme mit Faktizität und sogenannte Halluzinationen. Die Retrieval-Augmented Generation (RAG) adressiert diese Probleme, indem sie Modelle mit externen und verifizierten Wissensquellen verknüpft. In diesem Zusammenhang wird der RAG-Ansatz genutzt, um wissenschaftliche Texte zu analysieren und daraus präzise Antworten zu extrahieren. Dabei werden sowohl die inneren Zusammenhänge innerhalb komplexer Verknüpfungen als auch deren Bezug zur aktuellen Nutzung untersucht. Die angewandten Methoden und Techniken basieren auf künstlicher Intelligenz und unterstützen verschiedene Prozesse der Sprachverarbeitung. Ziel ist es, interdisziplinäres technisch-wissenschaftliches Wissen abzurufen, zu strukturieren und nutzbar zu machen. Hierzu werden Text-Mining-Techniken verwendet, um relevante Informationen aus Dokumenten zu extrahieren und die Textgenerierung auch in komplexen Kontexten effizient zu gestalten. Darüber hinaus untersucht das Forschungsfeld der Wissenschafts-Bots mögliche Verbindungen und Interaktionen, die in der Literatur bisher kaum beachtet oder nur theoretisch behandelt wurden. Dies trägt zur laufenden wissenschaftlichen Debatte bei. Als Fallstudie wurde Sci-Bot mit wissenschaftlichen Texten gefüttert, die bestehende Informationen zu potenziellen Forschungsperspektiven, Strategien im Bereich der öffentlichen Gesundheit sowie zu kulturellen Veränderungen enthalten. Der RAG-Ansatz ermöglicht es so, wissenschaftliche Informationen umfassend zu artikulieren und die komplexen Verbindungen zwischen Schlüsselkonzepten im aktuellen Pandemiekontext detailliert darzustellen.	ger
dc.description.resumo	Os Modelos de Linguagem de Grande Escala (LLMs) são programas baseados em IA que geram e manipulam texto, aprendendo a compreender e responder de maneira semelhante à humana ao desenvolver suas capacidades descritivas e potencial. Os LLMs enfrentam desafios como lacunas específicas de conhecimento, problemas de factualidade e alucinações. Portanto, a Geração Aumentada por Recuperação (RAG) visa abordar esses desafios conectando modelos a fontes de conhecimento externas e verificadas. Neste contexto, a abordagem RAG é usada aqui para analisar e extrair respostas de textos científicos, investigando a relação intrínseca dentro das conexões complexas e a relação com o uso atual. Os métodos e técnicas são baseados em IA para diferentes processamentos de linguagem para recuperar, racionalizar e mobilizar o conhecimento presente em um conhecimento técnico-científico interdisciplinar a ser validado. Portanto, técnicas de mineração de texto geradas a partir de documentos relevantes são integradas, enquanto a geração de texto parece operar mais efetivamente em contextos complexos para análises específicas em profundidade. Adicionalmente, o campo da Ciência ou Science-Bot estuda conexões e interações potenciais que são imaginadas ou sub-representadas na literatura, contribuindo para o debate em andamento de várias maneiras. Como estudo de caso, o Sci-Bot foi alimentado com textos científicos contendo informações existentes sobre possíveis direções para pesquisas futuras, estratégias de saúde pública na área de controle e prevenção, e até mesmo na mudança de traços culturais. Assim, a abordagem RAG possibilita articular informações textuais da literatura científica, permitindo uma explicação detalhada em escala sem precedentes de todas as conexões complexas ou interseções entre os conceitos-chave envolvidos no contexto recente da pandemia.	por
dc.description.sponsorship	Não recebi financiamento
dc.identifier.citation	REIS, Vítor Eulálio. Mineração de texto, inteligência artificial e aplicações em biotecnologia. 2024. Dissertação (Mestrado em Biotecnologia) – Universidade Federal de São Carlos, Campus São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/21572.	*
dc.identifier.uri	https://hdl.handle.net/20.500.14289/21572
dc.language.iso	por
dc.publisher	Universidade Federal de São Carlos
dc.publisher.address	Campus São Carlos
dc.publisher.initials	UFSCar
dc.publisher.program	Programa de Pós-Graduação em Biotecnologia - PPGBiotec
dc.rights	Attribution 3.0 Brazil	en
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/br/
dc.subject	Vitamina D	por
dc.subject	Covid-19	por
dc.subject	Análise de Proteínas (RNA_M)	por
dc.subject	Inteligência Artificial	por
dc.subject	LLM	eng
dc.subject	Sci-Bot	eng
dc.subject	Vitamin D	eng
dc.subject	Protein Analysis (RNA_M)	eng
dc.subject	Artificial Intelligence	eng
dc.subject	Proteinanalyse (RNA_M)	ger
dc.subject	Künstliche Intelligenz	ger
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA
dc.title	Mineração de texto, inteligência artificial e aplicações em biotecnologia	por
dc.title.alternative	Text mining, artificial intelligence and applications in biotechnology	eng
dc.title.alternative	Textmining, künstliche Intelligenz und Anwendungen in der Biotechnologie	ger
dc.type	Dissertação

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1

Nome:: Monografia_Vitor_Eulalio_Reis FINAL.pdf
Tamanho:: 3.41 MB
Formato:: Adobe Portable Document Format

Baixar

Coleções

Teses e Dissertações