AWS Trainium & Inferentia documentation
🚀 Tutorials: How To Fine-tune & Run LLMs
Optimum Neuron
🤗 Optimum NeuronEC2 SetupQuickstartSupported ArchitecturesOptimum Containers Notebooks
How-To Guides
Neuron model cacheDistributed TrainingExport a model to InferentiaInference pipelines with AWS NeuronInference on Neuron platforms using vLLMDeploying a LLM Model with Inference EndpointsBenchmarking LLM performance with vLLM on AWS Inferentia2
Training Tutorials
Fine-tune BERT for Text Classification
How-to Fine-Tune LLMs
Inference Tutorials
EC2
SageMaker
Inference Endpoints
Inference Benchmarks
Contribute
Set up a development environmentAdd a custom model implementation for trainingAdd inference support for a new model architecture
Training API
Models and Pipelines Inference API
🚀 Tutorials: How To Fine-tune & Run LLMs
Learn how to run and fine-tune models for optimal performance with AWS Trainium.
Llama 3.1
Instruction Fine-tuning of Llama 3.1 8B with LoRA on the Dolly dataset
Qwen3
Fine-tune Qwen3 8B with LoRA on the Simple Recipes dataset
Llama 3.2 on SageMaker
Continuous Pretraining of Llama 3.2 1B on SageMaker Hyperpod
What you’ll learn
These tutorials will guide you through the complete process of fine-tuning large language models on AWS Trainium:
- 📊 Data Preparation: Load and preprocess datasets for supervised fine-tuning
- 🔧 Model Configuration: Set up LoRA adapters and distributed training parameters
- ⚡ Training Optimization: Leverage tensor parallelism, gradient checkpointing, and mixed precision
- 💾 Checkpoint Management: Consolidate and merge model checkpoints for deployment
- 🚀 Model Deployment: Export and test your fine-tuned models for inference
Choose the tutorial that best fits your use case and start fine-tuning your LLMs on AWS Trainium today!


