Deep learning for high-resolution seismic imaging | Scientific Reports – Nature.com

Review of seismic imaging

The goal of seismic imaging is to infer subsurface structures based on observed seismic data. This can be achieved by solving inverse problems. Reverse Time Migration (RTM) is an imaging technique based on the wave equation25, which utilizes the cross-correlation of the underground forward and backward wavefields, demonstrating excellent adaptability, especially in areas with complex structures and high velocity variations. The formula for the cross-correlation imaging condition is expressed as:

$$I(x,z)={int }_{0}^{T}{u}_{text{f }}(x,z,t)*{u}_{text{b }}(x,z,t)dt$$

(1)

Here, (I(x,z)) represents the RTM result, ({u}_{text{f }}(x,z,t)) denotes the forward wavefield, and ({u}_{text{b }}(x,z,t)) is the backward wavefield.

However, RTM suffers from low-frequency noise and inaccurate amplitudes, limiting its application in seismic imaging. To address the shortcomings of RTM, Least Squares Reverse Time Migration (LSRTM) associates the migration imaging result with seismic data26, constructing the least squares objective function:

$$E({varvec{m}})=frac{1}{2}{parallel {varvec{L}}{varvec{m}}-{{varvec{d}}}_{{text{obs}}}parallel }^{2}$$

(2)

Here, ({{varvec{d}}}_{{text{obs}}}) represents the observed data, ({varvec{L}}) is the forward operator, and ({varvec{m}}) is the subsurface structural parameter.

LSRTM involves key steps such as forward simulation, backpropagation, gradient computation, and optimization algorithms. Through iterative optimization to minimize the error between observed and simulated data, LSRTM enhances the quality of seismic imaging.

In this study, we introduce a hybrid architecture (Fig.1) that integrates Transformer and CNN to address seismic imaging tasks. Within the Transformer framework, the need for a one-dimensional sequence as input necessitates an initial transformation of the input image. The Image Patching phase involves partitioning the input image into a series of equally sized image patches, each with a size of ({P}^{2}). This transforms the original (Htimes W) image into an (Ntimes Ptimes P) sequence, where (N) represents the sequence length, encompassing (frac{Htimes W}{{P}^{2}}) image patches. Consequently, the input image is reshaped into a one-dimensional sequence, with each image patch corresponding to a vector. The adoption of a smaller patch size enables enhanced capture of intricate details within the image, thus elevating the model's accuracy, albeit at the expense of increased computational overhead27. In view of balancing between model efficacy and computational efficiency, we establish (P=16). In the Input Embedding stage, a linear transformation is applied to each segmented image patch, mapping it to a continuous vector representation. As the Transformer model abstains from utilizing recurrent or convolutional layers for sequence processing, positional encoding is incorporated into the input embedding vector to discern the positional information of each image patch.

Network architecture diagram.

$${mathbf{Z}}_{0}=[{{varvec{X}}}_{p}^{1}mathbf{E};{mathbf{X}}_{p}^{2}mathbf{E};dots ;{mathbf{X}}_{p}^{N}mathbf{E}]+{mathbf{E}}_{text{pos}}$$

(3)

The proposed model employs a Transformer Encoder comprising ({text{L}}=12) layers to process the image sequence, with each encoder layer composed of Multi-Head Self-Attention (MSA) and Multi-Layer Perceptron (MLP).

$${mathbf{Z}}_{l}^{mathrm{^{prime}}}={text{MSA}}({text{LN}}({mathbf{Z}}_{l-1}))+{mathbf{Z}}_{l-1},l=1dots L$$

(4)

$${mathbf{Z}}_{l}={text{MLP}}({text{LN}}({mathbf{Z}}_{l}^{mathrm{^{prime}}}))+{mathbf{Z}}_{l}^{mathrm{^{prime}}},l=1dots L$$

(5)

Here, ({text{LN}}(cdot )) denotes layer normalization, (l) is the identifier for intermediate blocks, and L is the number of Transformer layers.

These stacked Transformer layers facilitate capturing the complexity of the data from a multiscale perspective. To prevent the loss of primary features by solely relying on the last layer output, we employ a multi-level feature extraction strategy. In addition to the final layer (12th layer), features are extracted from the 6th and 9th layers, representing deep, intermediate, and shallow features, providing a rich and multiscale feature space. These three layers of features are adjusted to different resolutions of feature maps and fused through ASFF, resulting in adaptive aggregation at each scale.

ASFF constitutes an attention-based spatial feature integration strategy devised to amalgamate feature maps originating from diverse spatial resolutions within deep neural networks28. Its principal objective is to augment the model's perceptual acuity concerning targets across varying scales. ASFF dynamically weights and fuses features from distinct spatial resolutions by learning task-specific attention weights.

We represent features at resolution level ({ell}) (where ({ell}in left{mathrm{1,2},3right})) as ({x}^{l}). For level ({ell}), we resize features from other levels (n) ((nne {ell})) to the same shape as ({x}^{l}). Let ({x}_{ij}^{nto {ell}}) denote the feature vector at position ((i,j)) on the feature map, adjusted from level (n) to level ({ell}). We perform the following fusion of corresponding level ({ell}) features:

$${y}_{ij}^{{ell}}={alpha }_{ij}^{{ell}}cdot {x}_{ij}^{1to {ell}}+{beta }_{ij}^{{ell}}cdot {x}_{ij}^{2to {ell}}+{gamma }_{ij}^{{ell}}cdot {x}_{ij}^{3to {ell}}$$

(6)

Here, ({y}_{ij}^{{ell}}) signifies the vector at position ((i,j)) in the output feature map ({y}^{{ell}}) across channels. The spatial importance weights ({alpha }_{ij}^{{ell}}), ({beta }_{ij}^{{ell}}), and ({gamma }_{ij}^{{ell}}) for features from three different levels to level ({ell}) are adaptively learned by the network. To ensure the effectiveness of weights, constraints ({alpha }_{ij}^{{ell}}+{beta }_{ij}^{{ell}}+{gamma }_{ij}^{{ell}}=1) and ({alpha }_{ij}^{{ell}}, {beta }_{ij}^{{ell}}, {gamma }_{ij}^{{ell}}in [mathrm{0,1}]) are enforced. These constraints ensure the validity and range of the weights. The weights are computed using softmax functions with control parameters as follows:

$${alpha }_{ij}^{{ell}}=frac{{e}^{{lambda }_{{alpha }_{ij}}^{{ell}}}}{{e}^{{lambda }_{{alpha }_{ij}}^{{ell}}}+{e}^{{lambda }_{{beta }_{ij}}^{{ell}}}+{e}^{{lambda }_{{gamma }_{ij}}^{{ell}}}}$$

(7)

$${beta }_{ij}^{{ell}}=frac{{e}^{{lambda }_{{beta }_{ij}}^{{ell}}}}{{e}^{{lambda }_{{alpha }_{ij}}^{{ell}}}+{e}^{{lambda }_{{beta }_{ij}}^{{ell}}}+{e}^{{lambda }_{{gamma }_{ij}}^{{ell}}}}$$

(8)

$${gamma }_{ij}^{{ell}}=frac{{e}^{{lambda }_{{gamma }_{ij}}^{{ell}}}}{{e}^{{lambda }_{{alpha }_{ij}}^{{ell}}}+{e}^{{lambda }_{{beta }_{ij}}^{{ell}}}+{e}^{{lambda }_{{gamma }_{ij}}^{{ell}}}}$$

(9)

The calculation of control parameters ({lambda }_{{alpha }_{ij}}^{{ell}}),({lambda }_{{beta }_{ij}}^{{ell}}), and ({lambda }_{{gamma }_{ij}}^{{ell}}) is performed through (1{text{x}}1) convolution layers from ({x}_{ij}^{1to {ell}}),({x}_{ij}^{2to {ell}}), and ({x}_{ij}^{3to {ell}}), respectively. These parameters are learned through standard backpropagation during network training.

Overall, this approach furnishes the model with a rich and multiscale feature space, thereby contributing to its performance in complex seismic imaging tasks.

Visit link:
Deep learning for high-resolution seismic imaging | Scientific Reports - Nature.com

Related Posts

Comments are closed.