-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper β’ 2602.10693 β’ Published β’ 220 -
Reinforced Attention Learning
Paper β’ 2602.04884 β’ Published β’ 29 -
Learning to Reason in 13 Parameters
Paper β’ 2602.04118 β’ Published β’ 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper β’ 2405.17604 β’ Published β’ 3
Collections
Discover the best community collections!
Collections including paper arxiv:2401.04088
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 121 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper β’ 2106.09685 β’ Published β’ 60 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper β’ 2101.03961 β’ Published β’ 13 -
Proximal Policy Optimization Algorithms
Paper β’ 1707.06347 β’ Published β’ 11
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper β’ 2403.03507 β’ Published β’ 190 -
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Paper β’ 2407.01906 β’ Published β’ 46 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper β’ 2305.14314 β’ Published β’ 61 -
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper β’ 2402.12354 β’ Published β’ 7
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 9 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 171 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 64
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper β’ 2412.15213 β’ Published β’ 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper β’ 2412.11768 β’ Published β’ 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper β’ 2412.13663 β’ Published β’ 163 -
Autoregressive Video Generation without Vector Quantization
Paper β’ 2412.14169 β’ Published β’ 14
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper β’ 2112.10752 β’ Published β’ 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper β’ 2302.05543 β’ Published β’ 58 -
Proximal Policy Optimization Algorithms
Paper β’ 1707.06347 β’ Published β’ 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper β’ 2305.18290 β’ Published β’ 64
-
Memory Augmented Language Models through Mixture of Word Experts
Paper β’ 2311.10768 β’ Published β’ 19 -
System 2 Attention (is something you might need too)
Paper β’ 2311.11829 β’ Published β’ 43 -
Fine-tuning Language Models for Factuality
Paper β’ 2311.08401 β’ Published β’ 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper β’ 2311.11045 β’ Published β’ 77
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper β’ 2602.10693 β’ Published β’ 220 -
Reinforced Attention Learning
Paper β’ 2602.04884 β’ Published β’ 29 -
Learning to Reason in 13 Parameters
Paper β’ 2602.04118 β’ Published β’ 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper β’ 2405.17604 β’ Published β’ 3
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper β’ 2112.10752 β’ Published β’ 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper β’ 2302.05543 β’ Published β’ 58 -
Proximal Policy Optimization Algorithms
Paper β’ 1707.06347 β’ Published β’ 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper β’ 2305.18290 β’ Published β’ 64
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 121 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper β’ 2106.09685 β’ Published β’ 60 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper β’ 2101.03961 β’ Published β’ 13 -
Proximal Policy Optimization Algorithms
Paper β’ 1707.06347 β’ Published β’ 11
-
Memory Augmented Language Models through Mixture of Word Experts
Paper β’ 2311.10768 β’ Published β’ 19 -
System 2 Attention (is something you might need too)
Paper β’ 2311.11829 β’ Published β’ 43 -
Fine-tuning Language Models for Factuality
Paper β’ 2311.08401 β’ Published β’ 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper β’ 2311.11045 β’ Published β’ 77
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper β’ 2403.03507 β’ Published β’ 190 -
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Paper β’ 2407.01906 β’ Published β’ 46 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper β’ 2305.14314 β’ Published β’ 61 -
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper β’ 2402.12354 β’ Published β’ 7
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 9 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 171 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 64
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper β’ 2412.15213 β’ Published β’ 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper β’ 2412.11768 β’ Published β’ 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper β’ 2412.13663 β’ Published β’ 163 -
Autoregressive Video Generation without Vector Quantization
Paper β’ 2412.14169 β’ Published β’ 14