Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de São Carlos
Resumo
Large Vision-Language Models (VLMs) have demonstrated capabilities in various image-related tasks, such as object detection, recognition, and counting. However, they still face significant challenges when dealing with images of graphs and diagrams. Given this landscape, it becomes crucial to investigate both methods to adapt these models and methodological alternatives capable of exploiting their true capabilities. Among possible approaches, Contrastive Learning stands out for achieving results superior to other models involving multiple textual attributes and positional dependencies. In this context, investigating the capabilities of the CLIP model proves pertinent, especially considering the advancement of other multimodal models across different domains. This work analyzes the potential of CLIP, coupled with Contrastive Learning, for tasks involving the extraction of similarity between graph images and the recognition of their edge lists, as well as the differentiation of graph structures. Results were positive regarding edge list recognition, while the clustering of different structures and degrees of regularity showed promising potential, albeit still requiring further exploration and fine-tuning. The analysis of metrics and the expansion of the methodology's applications may contribute significantly to advancing the state of the art in the field.
Descrição
Palavras-chave
Citação
TAKEDA, Eduardo Minoru. Análise do potencial de modelos contrastivos em tarefas de imagem-para-grafo. 2025. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23443.
Coleções
item.page.endorsement
item.page.review
item.page.supplemented
item.page.referenced
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil
