Ai-general
updated
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper
• 2512.02472
• Published
• 55
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
Video Reasoning without Training
Paper
• 2510.17045
• Published
• 8
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published
• 45
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published
• 59
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Paper
• 2511.03628
• Published
• 13
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
Paper
• 2512.01420
• Published
• 11
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published
• 8
Diversity Has Always Been There in Your Visual Autoregressive Models
Paper
• 2511.17074
• Published
• 8
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Paper
• 2511.13254
• Published
• 136
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
• 2510.18821
• Published
• 19
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published
• 57
Every Attention Matters: An Efficient Hybrid Architecture for
Long-Context Reasoning
Paper
• 2510.19338
• Published
• 115
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper
• 2511.16043
• Published
• 109
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
• 2510.03561
• Published
• 25
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
• 2509.08755
• Published
• 57
gpt-oss-120b & gpt-oss-20b Model Card
Paper
• 2508.10925
• Published
• 15
Paper
• 2412.16720
• Published
• 37
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published
• 20
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Paper
• 2511.09067
• Published
• 2
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Paper
• 2510.23038
• Published
• 1
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Paper
• 2511.06805
• Published
• 13
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Paper
• 2511.15958
• Published
• 1
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
Paper
• 2511.19899
• Published
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published
• 76
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
• 2512.03000
• Published
• 37
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper
• 2512.04926
• Published
• 42
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
• 2512.07834
• Published
• 45
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
• 2512.07461
• Published
• 78
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper
• 2512.13586
• Published
• 93
RePo: Language Models with Context Re-Positioning
Paper
• 2512.14391
• Published
• 12
Universal Reasoning Model
Paper
• 2512.14693
• Published
• 43
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published
• 119
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published
• 87
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper
• 2512.17351
• Published
• 28
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Paper
• 2512.14052
• Published
• 42
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
• 2512.19535
• Published
• 12
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published
• 93
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published
• 55
Multi-hop Reasoning via Early Knowledge Alignment
Paper
• 2512.20144
• Published
• 7
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
• 2512.21004
• Published
• 13
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
• 2512.21859
• Published
• 25
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Paper
• 2512.22322
• Published
• 39
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published
• 151
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper
• 2601.03193
• Published
• 47
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Paper
• 2601.01321
• Published
• 19
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published
• 85
Learning to Discover at Test Time
Paper
• 2601.16175
• Published
• 42
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
Paper
• 2601.17058
• Published
• 189