NCS-v1-3d-base
This is the 3D variant of the NCS-model, a seismic foundation model trained on a large share of full-stack seismic cubes from the Norwegian Continental Shelf (NCS) available through the public DISKOS database. This model has been developed by the Norwegian Computing Center (NR) in collaboration with the industry partners Equinor ASA and AkerBP ASA.
Model Description
NCS-v1-3d-base extends the ViT MAE framework to full 3D: the model ingests 3D seismic sub-volumes, tokenizes them into 3D patches, and applies a standard transformer encoder. Positional information is handled by LieRE (Lie Rotational Positional Encodings) (Ostmeier et al., 2024), a generalization of rotary position embeddings to arbitrary dimensions, enabling resolution-flexible inference on varying volume sizes.
Usage
NCS-v1-3d-base has been designed to produce features that can be used for fine-tuning on dowsntream tasks such as seismic facies classification, salt body segmentation, geological structure detection (e.g., injectites, faults), content-based seismic image retrieval, horizon and event tracking.
How to Use
Loading the Model
Install the NCS package from this repository before running the example below.
from NCS.models.vit3d import ViT3DModel
model = ViT3DModel.from_pretrained("NorskRegnesentralSTI/NCS-v1-3d-base")
Feature Extraction
import torch
# Input: 3D seismic sub-volume (B, C, D, H, W) — single channel
pixel_values = torch.randn(1, 1, 224, 224, 224)
with torch.no_grad():
outputs = model(pixel_values=pixel_values)
# CLS token (volume-level feature)
cls_features = outputs.last_hidden_state[:, 0, :] # shape: (B, 768)
# Patch-level features
patch_features = outputs.last_hidden_state[:, 1:, :] # shape: (B, 2744, 768)
Inference on Seismic Volumes
For running inference over full seismic volumes (SEG-Y / SGZ), use the NCS inference pipeline:
uv run scripts/inference.py \
--model-path NorskRegnesentralSTI/NCS-v1-3d-base \
--input-path /path/to/volume.segy \
--output-path ./features_3d.zarr \
--direction dir0 \
--densify 1 \
--num-overlap-patches 7 \
--overlap-filter ramp \
--batch-size 4 \
--device cuda:0 \
--dtype float16
Training Details
Pretraining Data
The model was pretrained on seismic reflection data from the Norwegian Continental Shelf (NCS), sourced from the DISKOS national data repository. The training corpus consists of 829 full-stack time and depth migrated 3D seismic cubes (~27 TB), spanning diverse geological settings, acquisition vintages, and processing generations across the NCS.
Preprocessing
- Seismic amplitudes are standardized per-cube to unit variance.
- Values are clipped at ±3 standard deviations.
- For each training sample, 2D slices are extracted at 4 azimuthal directions (0°, 45°, 90°, 135°) through the same spatial location.
- Diagonal slices (45°, 135°) are center-cropped and resized to correct for the √2 elongation.
- Single-channel slices are passed as separate views to a shared patch projection layer.
Training Procedure
To limit memory usage during training, the model uses pillar sampling: for each training sample, 40% of the mini-cube is randomly selected by sampling pillars of size 16 × 16 × 224 from the grid of 14 × 14 possible non-overlapping pillars making up the full 224 × 224 × 224 sub-volume. This increases spatial coverage per sample while keeping training tractable.
- Pretraining method: Masked Autoencoder (MAE) with 85% masking ratio (applied after concatenating patches across views; per-view mask count is not enforced)
- Initialization: ImageNet MAE ViT weights (RGB projection channels averaged to single-channel; 2D convolutional kernels expanded and interpolated to initialize 16 × 16 × 16 volumetric patch embeddings; original positional encodings removed)
- Framework: PyTorch with flash-attention kernels
- Hardware: 16 × NVIDIA GH200 GPUs
- Precision: bfloat16 mixed precision
- Global batch size: 2048
- Learning rate: Cosine schedule, base LR = 1.5 × 10⁻⁴, effective LR = base_lr × batch_size / 256, warmup ratio = 0.05
- Epochs: 100 (~1M samples per epoch)
- Sampling: Density-aware sampling from seismic cubes, biased toward regions with sparser spatial coverage
- Decoder: Lightweight 8-layer MAE decoder
Evaluation Protocol
Representations are evaluated with a frozen backbone using a k-nearest-neighbor (kNN, k=5) classifier on patch-level embeddings. Four interpretation benchmarks were used: salt segmentation, package segmentation, injectite mapping, and flatspot mapping, measured by mean Intersection-over-Union (mIoU). Only 100 labeled points per class are used (or a single labeled line for injectites).
Code
The model code and inference pipeline are available at: https://github.com/NorskRegnesentral/NCS_models
Citation
If you use this model, please cite:
@article{ordonez2025ncsmodel,
title={The {NCS}-model: A seismic foundation model trained on the Norwegian repository of public seismic data},
author={Ordo{\~n}ez, Alba and Forgaard, Theodor Johannes Line and Wade, David and Bugge, Aina Juell and Nese, H{\aa}kon and Waldeland, Anders Ueland},
journal={arXiv preprint arXiv:2603.23211},
year={2025}
}
Acknowledgments
This work is funded by The Research Council of Norway through the SFI Visual Intelligence (Centre for Research-based Innovation), grant no. 309439, and the industry partners Equinor ASA and AkerBP ASA. We also thank Equinor and AkerBP for providing access to the seismic data used in the evaluation.
- Downloads last month
- 130