Ritical Failures: AutoTrain Space stuck on "Building" or crashing

Hi AutoTrain Team,

I have been trying to SFT fine-tune large models (Llama 3.1 70B, Llama 3.3 70B, and Qwen 2.5 32B) using a private AutoTrain Space, but I am facing persistent critical failures regardless of the configuration.

The Issues:

  1. Stuck on “Building”: The Space often hangs on the “Building” status for over an hour without starting, likely timing out during the model download/docker build phase.

  2. Immediate Crash (Paused): When the Space does go to “Running”, clicking “Start Training” causes the Space to immediately crash and enter a “Paused” state within seconds. No logs are generated in the UI logs tab.

Configurations Tried:

  • Hardware: Tried both Nvidia 4xL40S and 1xL40S (and 4xL4).

  • Models: meta-llama/Meta-Llama-3.1-70B-Instruct, Qwen/Qwen2.5-32B-Instruct (and others).

  • Dataset: Validated JSONL (Conversation format), mapped correctly to messages.

  • Parameters:

    • Quantization: int4 (Always enabled)

    • Backend: ddp (for multi-gpu) and standard (for single).

    • Unsloth: Tried both True (single GPU) and False (multi-GPU).

    • Mixed Precision: bf16.

    • Block Size: Tried 2048 and 4096.

I have wasted significant compute credits on “Building” times and failed starts. Is there a known issue with loading 30B+ models in Spaces currently?

I would appreciate any assistance or guidance on how to successfully run these models, or a refund for the compute time lost due to these system errors.

Thanks.

1 Like

If the issue involves money, you should first contact Hugging Face Support directly: [email protected]

Separately, it might simply be that there isn’t enough VRAM.


You are running into two separate but related problems:

  1. The Space often never finishes building.
  2. When it does run, starting training kills the Space almost immediately (auto-paused, no logs in the UI tab).

For 30B–70B models, both the build layer and the training layer are right on the edge of what AutoTrain + Spaces + 1×/4×L40S/L4 can realistically handle. Small misconfigurations easily turn into “critical failures”.

Below is a detailed but structured explanation of:

  • Why each symptom happens (with background).
  • How this interacts with 70B-class models and AutoTrain.
  • Concrete steps to fix or work around it, including realistic configurations.

1. Background: what your AutoTrain Space is actually doing

A private AutoTrain Space is still just a normal Hugging Face Space:

  1. Build phase (Building)

    • Clone the repo.
    • Install dependencies / Docker image.
    • Optionally pre-download models/datasets if preload_from_hub is used in the README metadata. (Hugging Face)
    • This phase is subject to startup_duration_timeout – by default 30 minutes. After that, the Space is flagged as unhealthy if it hasn’t started. (Hugging Face)
  2. Run phase (Running)

    • Starts the AutoTrain UI app.
    • When you click Start Training, AutoTrain launches a training process (roughly equivalent to autotrain llm ... / accelerate launch ...) inside the container.
  3. Pause/stop

    • AutoTrain Spaces have a setting PAUSE_ON_FAILURE. If it is 1, any error in the training process causes the entire Space to auto-pause with the message “This Space has been paused by its owner”. (Hugging Face Forums)

So your two symptoms map to two layers:

  • “Stuck on Building” → build phase fails or times out.
  • “Start Training → immediately Paused” → training process dies almost instantly; auto-pause hides the underlying error.

The bad news: 30B–70B models make both of these layers fragile.
The good news: the failure patterns are well-understood once you unpack them.


2. Symptom 1 – Space stuck on “Building” for a long time

2.1. Why this happens

Key facts about Spaces builds:

  • The startup_duration_timeout config controls how long a Space is allowed to start before being flagged unhealthy.
  • Default is 30 minutes, but you can set it higher in the README front-matter (e.g. 1h, 2h). (Hugging Face)

A typical build for a heavy Space may include:

  • Installing large Python dependencies (AutoTrain, transformers, trl, bitsandbytes, unsloth, flash-attn, etc.).
  • Downloading models via preload_from_hub in the README, if configured. Many Spaces use this to cache models like Whisper, TTS, etc. (Hugging Face)

For 70B models, preload_from_hub is dangerous:

  • Meta-Llama-3.1-70B-Instruct or Llama-3.3-70B is >100 GB of weights.
  • Pulling that during the build can easily exceed 30 minutes, especially if you also install heavy wheels.

There are already forum threads where previously working Docker Spaces suddenly start hitting build timeouts, with logs showing “job timeout” after ~30 minutes, tied directly to startup_duration_timeout. (Hugging Face Forums)

In practice, for your use case, “Building for over an hour” typically means:

  • Either the build actually timed out around the default 30 minutes and the metadata is stuck, or
  • You are repeatedly hitting the limit while the builder is downloading large models or resolving dependencies.

2.2. How to fix / mitigate “Building” hangs

  1. Inspect README metadata

    In the Space’s README.md, check the YAML front-matter:

    ---
    title: ...
    sdk: gradio  # or docker
    app_file: app.py
    startup_duration_timeout: 1h   # or 2h, etc.
    # preload_from_hub:
    #   - meta-llama/Meta-Llama-3.1-70B-Instruct
    ---
    
    • If there is no startup_duration_timeout, the default is 30 minutes → set it to 1h or 2h. (Hugging Face)
    • If preload_from_hub includes your 70B models, comment those lines out or at least remove them while debugging. Commit messages in other Spaces show people explicitly removing preload_from_hub because it made builds unstable. (Hugging Face)

    General rule: do not preload huge 30B+ models in the build phase. Let AutoTrain download them at runtime.

  2. Minimize and pin dependencies

    • Use a small, pinned requirements.txt (e.g. specific versions of autotrain-advanced, transformers, trl, unsloth).
    • Avoid unbounded transformers>=... etc., which forces pip to resolve a large dependency graph each build.
  3. Force a clean rebuild when the Space is “poisoned”

    In several build timeout threads the recommendation is:

    • Do a “factory reboot” / fresh build from HF side if necessary. (Hugging Face Forums)
    • From your side: push a trivial change to app.py or README to trigger a new build.
    • If still stuck, temporarily flip sdk: gradio → static → gradio (or docker) and push again to clear metadata.
  4. Keep the Space repo small

    Do not commit model weights or huge datasets into the Space repository. Large repos or Git-LFS blobs increase build times and can hit various limits. HF docs explicitly remind that files >10 MB use LFS, and Spaces are not meant to host entire model checkpoints in the app repo. (Hugging Face)

If you do these, “Building for an hour” should either disappear, or at least fail quickly with clear logs instead of silently eating credits.


3. Symptom 2 – “Start Training” → Space instantly Paused

3.1. What “paused by its owner” really means

Multiple recent threads show exactly your behavior:

  • User clicks Start Training in AutoTrain UI.
  • Training appears to start, then after a few seconds the page shows “This Space has been paused by its owner.” (Hugging Face Forums)

HF staff and other users explain:

  • AutoTrain Spaces have a PAUSE_ON_FAILURE Space setting.
  • If PAUSE_ON_FAILURE = 1, any error in the training process (CUDA OOM, Python exception, dataset problem) causes the Space to auto-pause instead of running indefinitely. (Hugging Face Forums)

So “paused” does not mean you clicked pause; it means:

The training subprocess crashed almost immediately, and AutoTrain shut the whole Space down for you.

Often, no logs appear in the AutoTrain UI tab because:

  • The process dies before it can stream logs back into the UI.
  • But the error message is usually present in the Space runtime logs (the logs button on the Space page).

3.2. Likely failure modes with your exact configuration

You have:

  • Models: Llama 3.1 70B / 3.3 70B, Qwen 2.5 32B.
  • Hardware: 1× L40S, 4× L40S, 4× L4.
  • Quantization: int4.
  • Backend: ddp for multi-GPU; standard for single.
  • Unsloth: toggled True (single) / False (multi).
  • Mixed precision: bf16.
  • Block size: 2048 and 4096.

From public VRAM data:

  • For Llama 3.1 70B, a Japanese QLoRA tutorial reports:

  • Unsloth’s docs and blogs claim:

    • “70B LLaMA fits in <48 GB VRAM with QLoRA in Unsloth.” (docs.unsloth.ai)

That tells you two important things:

  1. 70B QLoRA is borderline even on a 48 GB GPU.

  2. Unsloth can make it work on 1×48 GB, but only in its optimized pipeline. If AutoTrain isn’t perfectly using that path, your margin vanishes. (docs.unsloth.ai)

On top of that:

  • DDP replicates the whole model on each GPU. 4×L40S with DDP still means each GPU must hold the full 70B model; you do not get 4×48 GB as one 192 GB pool. For memory scaling you’d need FSDP or ZeRO-style sharding, which AutoTrain does not expose in fine detail via the UI.
  • Block sizes 2048–4096 and any non-trivial batch size significantly increase activation memory.

Putting this together, the most probable failure modes are:

3.2.1. Immediate CUDA OOM at model load or first forward

  • When AutoTrain tries to load Llama-3.1/3.3-70B in 4-bit and build the training graph, VRAM usage spikes.
  • On 48 GB, any overhead (DDP, larger block size, non-Unsloth path, extra optimizer states) can push over the limit.
  • This causes a CUDA out-of-memory error inside bitsandbytes or torch and the process exits immediately.

There is a very similar AutoTrain thread: attempting Mixtral-8x7B with AutoTrain Advanced UI on 4×A10G (96 GB VRAM total) hits CUDA OOM during 4-bit loading, even before training, and the maintainer suggests 8×A100 for safety. (Hugging Face Forums)

70B Llama is in the same or heavier class, so your 1×/4× L40S + AutoTrain stack is easily in OOM territory.

3.2.2. DDP instead of sharding for large models

  • DDP is great for speed when the model fits on a single GPU, but it does not solve memory. Every GPU holds a full copy of the model.
  • For true memory scaling on 70B you need FSDP / ZeRO (parameter sharding across devices) or specialized solutions like the Answer.AI FSDP+QLoRA setup. (Answer.AI)

Given that AutoTrain’s UI only exposes a simple “backend: ddp / standard”, chances are:

  • With 4×L40S and ddp, you simply get four copies of a 70B model that already barely fits (or doesn’t), plus extra DDP buffers.

That makes OOM on 4×L40S even more likely than on 1×L40S if config is not extremely conservative.

3.2.3. Missing or misconfigured PEFT / LoRA

You did not mention explicitly enabling PEFT/LoRA in the UI.

  • AutoTrain’s LLM docs emphasise PEFT + quantization as the intended path for large models; full fine-tuning is not realistic for 70B on modest hardware. (Answer.AI)
  • If for any reason the config ends up with peft: false (full-model SFT), GPU requirements explode (hundreds of GB VRAM). That would cause an instant crash even before training begins.

So one of the first things to verify in the AutoTrain config is:

  • That PEFT/LoRA is enabled for these runs, not full-model fine-tuning.
  • That Unsloth is actually being used (not silently ignored due to version mismatch or unsupported combo).

3.2.4. Trainer / library incompatibility

Because you are on very new models:

  • Llama 3.1 / 3.3, Qwen 2.5, new chat templates and tokenization.

These may require:

  • Recent transformers (e.g., Llama-3.x uses updated RoPE scaling and config fields; official examples rely on 4.40+ / 4.43+). (Hugging Face)

If AutoTrain’s Docker image pins an older stack, you can get:

  • “Unexpected argument” / missing attribute errors when constructing the model or trainer.
  • TRL / PEFT / transformers API mismatch.

Those throw Python exceptions immediately, which again lead to PAUSE_ON_FAILURE auto-pausing with no UI logs, but with a stack trace in the underlying Space logs. (Hugging Face Forums)


4. Is there a “known issue” with 30B+ in Spaces?

There is no official blanket statement like “Spaces cannot load >30B models”, but there are multiple relevant patterns:

  1. Build timeout issues on Spaces

    • Users report Docker / GPU Spaces that used to build in ~45 minutes now hitting a “build error – job timeout” at 30 minutes, apparently tied to startup_duration_timeout. (Hugging Face Forums)
  2. AutoTrain Spaces auto-pausing on start of training

    • Several users see exactly what you see: training starts, then within seconds the Space pauses with “This Space has been paused by its owner.”
    • The cause is consistently the PAUSE_ON_FAILURE setting reacting to early errors (OOM, config, dataset, etc.). (Hugging Face Forums)
  3. Large model OOMs with AutoTrain

    • Mixtral-8x7B + AutoTrain on 4×A10G runs out of memory during bitsandbytes 4-bit load, even though the hardware sounds strong, demonstrating that large mixture-of-experts / 70B-class models can OOM in AutoTrain even at int4 on 96 GB total VRAM. (Hugging Face Forums)

So the picture is:

  • There is no single global bug like “30B+ disabled”.
  • But there is a cluster of recurring issues: build timeouts, AutoTrain auto-pause, and OOM / version problems when people try to use 30B–70B models in Spaces.

Your symptoms line up almost exactly with those patterns.


5. Concrete path to make this work (or fail fast, cheaply)

5.1. Stop bleeding credits during build

  1. In README:

    • Set:

      startup_duration_timeout: 1h  # or 2h if your build is truly heavy
      

      (Hugging Face)

    • Remove all 70B models from preload_from_hub. Only preload smaller models if absolutely necessary. (Hugging Face)

  2. Make sure:

    • No model checkpoints are in the Space repo itself.
    • requirements.txt is lean and pinned.

If you still see infinite “Building”, trigger a fresh build (small commit) and, if needed, contact HF support referencing the build timeout threads.

5.2. Make the errors visible instead of auto-pausing

While debugging:

  1. In the Space settings, set PAUSE_ON_FAILURE = 0 (where available) as suggested in the AutoTrain pause thread. (Hugging Face Forums)

  2. Start training once, let it fail, and immediately open the Space runtime logs (top-right logs button on the Space).

  3. Look for the last 50–100 lines: you are likely to see:

    • CUDA out of memory
    • A transformers / TRL / PEFT exception
    • Dataset column errors

Doing this once is critical: it will tell you whether you have a pure OOM, a library mismatch, or something else.

5.3. Prove the pipeline works on a smaller model

Use the same AutoTrain Space and dataset, but:

  • Model: e.g. meta-llama/Meta-Llama-3.1-8B-Instruct or Qwen/Qwen2.5-7B-Instruct.

  • Hardware: 1×L4 or 1×L40S.

  • Config (safe baseline):

    • PEFT / LoRA: enabled.
    • Quantization: int4.
    • Unsloth: True (if available in UI).
    • Block size: 1024 (not 2048/4096).
    • Max length: 2048.
    • Batch size: 2, gradient accumulation: 4.

If that fails:

  • The problem is not 70B; it is either environment, dataset mapping, or AutoTrain stack. Fixing that is mandatory.

If it succeeds:

  • You now know: dataset + AutoTrain config + Space image are fundamentally sound.

5.4. Scale up gradually: try 14B and 32B before 70B

Next:

  1. Try Qwen/Qwen2.5-14B-Instruct with the same configuration, possibly lowering:

    • Block size: keep 1024.
    • Batch size: 1, grad_accum: 8.
  2. Then try Qwen2.5-32B with the same pattern, still on 1×L40S, PEFT+int4+Unsloth, no DDP.

32B SFT with PEFT is a realistic target for L40S; if that is unstable, 70B will be much worse.

5.5. Only then attempt 70B, with extremely conservative settings

If 32B is okay and you still want 70B, use:

  • Hardware: 1×L40S first, not 4× with DDP.

  • Model: Meta-Llama-3.3-70B-Instruct or Meta-Llama-3.1-70B-Instruct.

  • Config:

    • PEFT / LoRA: enabled.
    • Quantization: int4.
    • Unsloth: True (critical).
    • Block size: 1024 (do not start with 2048/4096).
    • Max length: 2048.
    • Batch size: 1.
    • Grad accumulation: 8–16.
    • Mixed precision: bf16 or fp16.
    • Backend: no DDP, run on a single GPU.

These settings intentionally under-utilize the GPU but maximize the chance of fitting, aligning with Unsloth’s claim that 70B LLaMA QLoRA fits in <48 GB VRAM on 48 GB cards. (docs.unsloth.ai)

If even this fails with OOM:

  • Then, in this environment (AutoTrain + Spaces + given image), 70B SFT is beyond the practical VRAM envelope, despite theoretical Unsloth results.

At that point, the rational options are:

  • Move 70B training off Spaces to a dedicated machine where you control deepspeed/FSDP sharding, or
  • Use 70B only as a teacher for distillation / synthetic data, and fine-tune 8–14–32B models instead.

6. Short summary

  • Your “Building forever” issue is best explained by Spaces build limits:

    • Default startup_duration_timeout is 30 minutes. (Hugging Face)
    • Pre-downloading huge 70B models via preload_from_hub or heavy installs can easily hit that and leave the Space apparently “stuck”. (Hugging Face)
  • Your “Start Training → instantly Paused” issue is not random:

    • AutoTrain Spaces have PAUSE_ON_FAILURE, which auto-pauses the Space as soon as the training process errors. (Hugging Face Forums)
    • The lack of UI logs just means the process died before streaming logs; the actual error is in the Space runtime logs.
  • Given your models and hardware, the most likely root causes are:

    • CUDA OOM at model load or first forward for 70B QLoRA on 48 GB L40S, especially with DDP and large block sizes. (It’s a company called Hirenzo. It’s a GP cloud service.)
    • Possibly missing or misconfigured PEFT/LoRA, effectively attempting full-model SFT.
    • Potential library incompatibilities between AutoTrain’s pinned stack and the very latest Llama 3.x / Qwen 2.5 models.
  • Suggested solutions and workflow:

    • Fix build: set startup_duration_timeout to ≥1h and remove large models from preload_from_hub. (Hugging Face)
    • Disable PAUSE_ON_FAILURE while debugging and read the Space runtime logs. (Hugging Face Forums)
    • Prove the pipeline on 8B → 14B → 32B with PEFT+int4+Unsloth on 1×L40S, conservative block size.
    • Only then attempt 70B, again on 1×L40S, with extremely conservative settings; if that still fails, treat 70B SFT in AutoTrain Spaces as out-of-scope for your current hardware/stack and move to a more controlled environment or smaller models.

This way you either get a stable configuration for 32B–70B, or you get a clear, fast failure signal and can stop burning credits on builds and instant crashes.