Estimativa de mapas de profundidade usando encoder-decoder com módulos de atenção

Casado, Ricardo Salvino

Estimativa de mapas de profundidade usando encoder-decoder com módulos de atenção

Arquivos

Tese_Casado.pdf (62.8 MB)

Data

2024-10-17

Autores

Casado, Ricardo Salvino

Editor

Universidade Federal de São Carlos

Resumo

Camera calibration plays a fundamental role in image acquisition, being an important step in stereo vision systems. This technique allows for precise correspondence between points of interest in images captured by different cameras. With calibration, it is possible to relate real-world coordinates to image coordinates, which is essential for determining disparity (the positional difference of a point in the two images) and, consequently, estimating depth in a scene. However, the configuration of these systems is still a quite complex process. Therefore, the estimation of depth maps from single images has gained increasing traction, as it uses only one camera to capture images. Instead of relying on disparity, monocular techniques use computer vision algorithms, such as structure from motion, feature matching, and deep learning, to infer depth in a scene. Additionally, they are more flexible in terms of hardware, as they utilize only one camera, making them more suitable for applications in mobile devices and embedded systems. With this in mind, the present work proposes a new methodology that employs genetic programming and symbolic regression as an alternative to conventional camera calibration methods. The results obtained with the proposed approach demonstrate superior accuracy compared to some of the most relevant methods in the literature. All experiments conducted with this new approach were evaluated using Wilcoxon statistical tests with a significance level of 5%. For the process of estimating depth maps from monocular images, the work investigates the use of attention modules, proposing new configurations and architectural modifications with the aim of achieving competitive results relative to the state of the art. After ablation studies, it was observed that the configuration with CBAM (Convolutional Block Attention Module) in the encoder and Modified GCNet (Global Context Networks) in the decoder yielded the best results for the problem of depth map estimation using a simple Convolutional Neural Network model. The evaluation of the proposed model indicated, in some scenarios, superior performance compared to the works used for comparison, showing an improvement of 25.22% in Absolute Relative Error and 6.28% in Mean Squared Error. In summary, this work contributes significantly to the advance ment of research in camera calibration and depth estimation, opening new perspectives for the application of convolutional neural networks in resource-limited contexts. The practical implications of the results are vast, suggesting that the ongoing development of deep learning architectures can not only improve accuracy in depth estimation but also make these technologies more accessible and applicable in mobile devices and embedded systems. Future studies could explore the integration of these methodologies in real-world scenarios, enhancing their applicability in areas such as robotics, augmented reality, and autonomous navigation, thus establishing a significant advancement in computer vision research.

Palavras-chave

Calibração de câmeras, Programação genética e regressão simbólica, Mapa de profundidade, Redes generativas adversárias, Codificador-decodificador, Aprendizado profundo, Módulo de atenção por bloco convolucional (CBAM), Redes de contexto global (GCNet)

Citação

CASADO, Ricardo Salvino. Estimativa de mapas de profundidade usando encoder-decoder com módulos de atenção. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/21001.

URI

https://repositorio.ufscar.br/handle/20.500.14289/21001

Coleções

Teses e Dissertações

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution-NonCommercial-NoDerivs 3.0 Brazil

Página do item completo

Estimativa de mapas de profundidade usando encoder-decoder com módulos de atenção

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons