Synthetic data generation and augmentation for data-centric deep learning seismic inversion

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de São Carlos

Resumo

Deep Learning Inversion (DLI) has emerged as a promising and computationally efficient alternative to traditional seismic inversion. However, its practical adoption is limited by the scarcity of large and diverse training datasets. Existing research in DLI has primarily emphasized the design of novel network architectures, while largely overlooking the potential of data-centric training strategies. This dissertation addresses this gap by systematically evaluating the effectiveness of synthetic data strategies for enhancing DLI workflows. We first establish the fundamental capabilities of data synthesis, demonstrating the generation of high-fidelity velocity models using Denoising Diffusion Probabilistic Models and exploring semantically controllable geological feature engineering through Variational Autoencoders. Building on these foundations, we conduct a comprehensive comparative study that empirically assesses the impact of a spectrum of data augmentation techniques (ranging from simple interpolation to advanced generative diffusion) across multiple DLI architectures. The results show that data augmentation consistently improves performance, while also revealing that more complex strategies do not always outperform simpler ones. Importantly, the choice of augmentation method is closely tied to the underlying model architecture, highlighting a critical cost-benefit trade-off in data strategy design. The practical relevance of this research is further demonstrated through the development of a state-of-the-art DLI solution that leveraged a massive-scale synthetic dataset to secure a top 1% ranking in an international seismic inversion competition. This dissertation contributes to advancing data-centric approaches for seismic inversion, underscoring that the strategic use of synthetic data is essential to fully unlock the potential of DLI.

Descrição

Citação

CARVALHO JUNIOR, Carlos Gomes de. Synthetic data generation and augmentation for data-centric deep learning seismic inversion. 2025. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2025. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/23419.

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Attribution 3.0 Brazil