Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo

Takeda, Eduardo Minoru

Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo

Arquivos

item.page.bitstreams.primary TCC_Eduardo_Takeda.pdf (2.85 MB)

Data

2025-12-18

Autores

Takeda, Eduardo Minoru

Editor

Universidade Federal de São Carlos

Resumo

Large Vision-Language Models (VLMs) have demonstrated capabilities in various image-related tasks, such as object detection, recognition, and counting. However, they still face significant challenges when dealing with images of graphs and diagrams. Given this landscape, it becomes crucial to investigate both methods to adapt these models and methodological alternatives capable of exploiting their true capabilities. Among possible approaches, Contrastive Learning stands out for achieving results superior to other models involving multiple textual attributes and positional dependencies. In this context, investigating the capabilities of the CLIP model proves pertinent, especially considering the advancement of other multimodal models across different domains. This work analyzes the potential of CLIP, coupled with Contrastive Learning, for tasks involving the extraction of similarity between graph images and the recognition of their edge lists, as well as the differentiation of graph structures. Results were positive regarding edge list recognition, while the clustering of different structures and degrees of regularity showed promising potential, albeit still requiring further exploration and fine-tuning. The analysis of metrics and the expansion of the methodology's applications may contribute significantly to advancing the state of the art in the field.

Palavras-chave

Modelos de visão e linguagem, Grafos, Aprendizado contrastivo, Similaridade de grafos

Citação

TAKEDA, Eduardo Minoru. Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo. 2025. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23443.

URI

https://hdl.handle.net/20.500.14289/23443

Coleções

TCC

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil

Página do item completo

Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons