Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

Large Vision-Language Models (VLMs) have demonstrated capabilities in various image-related tasks, such as object detection, recognition, and counting. However, they still face significant challenges when dealing with images of graphs and diagrams. Given this landscape, it becomes crucial to investigate both methods to adapt these models and methodological alternatives capable of exploiting their true capabilities. Among possible approaches, Contrastive Learning stands out for achieving results superior to other models involving multiple textual attributes and positional dependencies. In this context, investigating the capabilities of the CLIP model proves pertinent, especially considering the advancement of other multimodal models across different domains. This work analyzes the potential of CLIP, coupled with Contrastive Learning, for tasks involving the extraction of similarity between graph images and the recognition of their edge lists, as well as the differentiation of graph structures. Results were positive regarding edge list recognition, while the clustering of different structures and degrees of regularity showed promising potential, albeit still requiring further exploration and fine-tuning. The analysis of metrics and the expansion of the methodology's applications may contribute significantly to advancing the state of the art in the field.

Descrição

Citação

TAKEDA, Eduardo Minoru. Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo. 2025. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23443.

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil