NCS-v1-3d-base

This is the 3D variant of the NCS-model, a seismic foundation model trained on a large share of full-stack seismic cubes from the Norwegian Continental Shelf (NCS) available through the public DISKOS database. This model has been developed by the Norwegian Computing Center (NR) in collaboration with the industry partners Equinor ASA and AkerBP ASA.

Model Description

NCS-v1-3d-base extends the ViT MAE framework to full 3D: the model ingests 3D seismic sub-volumes, tokenizes them into 3D patches, and applies a standard transformer encoder. Positional information is handled by LieRE (Lie Rotational Positional Encodings) (Ostmeier et al., 2024), a generalization of rotary position embeddings to arbitrary dimensions, enabling resolution-flexible inference on varying volume sizes.

Usage

NCS-v1-3d-base has been designed to produce features that can be used for fine-tuning on dowsntream tasks such as seismic facies classification, salt body segmentation, geological structure detection (e.g., injectites, faults), content-based seismic image retrieval, horizon and event tracking.

How to Use

Loading the Model

Install the NCS package from this repository before running the example below.

from NCS.models.vit3d import ViT3DModel

model = ViT3DModel.from_pretrained("NorskRegnesentralSTI/NCS-v1-3d-base")

Feature Extraction

import torch

# Input: 3D seismic sub-volume (B, C, D, H, W) — single channel
pixel_values = torch.randn(1, 1, 224, 224, 224)

with torch.no_grad():
    outputs = model(pixel_values=pixel_values)

# CLS token (volume-level feature)
cls_features = outputs.last_hidden_state[:, 0, :]  # shape: (B, 768)

# Patch-level features
patch_features = outputs.last_hidden_state[:, 1:, :]  # shape: (B, 2744, 768)

Inference on Seismic Volumes

For running inference over full seismic volumes (SEG-Y / SGZ), use the NCS inference pipeline:

uv run scripts/inference.py \
  --model-path NorskRegnesentralSTI/NCS-v1-3d-base \
  --input-path /path/to/volume.segy \
  --output-path ./features_3d.zarr \
  --direction dir0 \
  --densify 1 \
  --num-overlap-patches 7 \
  --overlap-filter ramp \
  --batch-size 4 \
  --device cuda:0 \
  --dtype float16

Training Details

Pretraining Data

The model was pretrained on seismic reflection data from the Norwegian Continental Shelf (NCS), sourced from the DISKOS national data repository. The training corpus consists of 829 full-stack time and depth migrated 3D seismic cubes (~27 TB), spanning diverse geological settings, acquisition vintages, and processing generations across the NCS.

Preprocessing

Seismic amplitudes are standardized per-cube to unit variance.
Values are clipped at ±3 standard deviations.
For each training sample, 2D slices are extracted at 4 azimuthal directions (0°, 45°, 90°, 135°) through the same spatial location.
Diagonal slices (45°, 135°) are center-cropped and resized to correct for the √2 elongation.
Single-channel slices are passed as separate views to a shared patch projection layer.

Training Procedure

To limit memory usage during training, the model uses pillar sampling: for each training sample, 40% of the mini-cube is randomly selected by sampling pillars of size 16 × 16 × 224 from the grid of 14 × 14 possible non-overlapping pillars making up the full 224 × 224 × 224 sub-volume. This increases spatial coverage per sample while keeping training tractable.

Pretraining method: Masked Autoencoder (MAE) with 85% masking ratio (applied after concatenating patches across views; per-view mask count is not enforced)
Initialization: ImageNet MAE ViT weights (RGB projection channels averaged to single-channel; 2D convolutional kernels expanded and interpolated to initialize 16 × 16 × 16 volumetric patch embeddings; original positional encodings removed)
Framework: PyTorch with flash-attention kernels
Hardware: 16 × NVIDIA GH200 GPUs
Precision: bfloat16 mixed precision
Global batch size: 2048
Learning rate: Cosine schedule, base LR = 1.5 × 10⁻⁴, effective LR = base_lr × batch_size / 256, warmup ratio = 0.05
Epochs: 100 (~1M samples per epoch)
Sampling: Density-aware sampling from seismic cubes, biased toward regions with sparser spatial coverage
Decoder: Lightweight 8-layer MAE decoder

Evaluation Protocol

Representations are evaluated with a frozen backbone using a k-nearest-neighbor (kNN, k=5) classifier on patch-level embeddings. Four interpretation benchmarks were used: salt segmentation, package segmentation, injectite mapping, and flatspot mapping, measured by mean Intersection-over-Union (mIoU). Only 100 labeled points per class are used (or a single labeled line for injectites).

Code

The model code and inference pipeline are available at: https://github.com/NorskRegnesentral/NCS_models

Citation

If you use this model, please cite:

@article{ordonez2025ncsmodel,
  title={The {NCS}-model: A seismic foundation model trained on the Norwegian repository of public seismic data},
  author={Ordo{\~n}ez, Alba and Forgaard, Theodor Johannes Line and Wade, David and Bugge, Aina Juell and Nese, H{\aa}kon and Waldeland, Anders Ueland},
  journal={arXiv preprint arXiv:2603.23211},
  year={2025}
}

Acknowledgments

This work is funded by The Research Council of Norway through the SFI Visual Intelligence (Centre for Research-based Innovation), grant no. 309439, and the industry partners Equinor ASA and AkerBP ASA. We also thank Equinor and AkerBP for providing access to the seismic data used in the evaluation.