MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing Paper • 2011.09899 • Published Nov 19, 2020
A Simple Background Augmentation Method for Object Detection with Diffusion Model Paper • 2408.00350 • Published Aug 1, 2024
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 46
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 93
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement Paper • 2504.16053 • Published Apr 22, 2025
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 143
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Paper • 2507.12507 • Published Jul 16, 2025 • 1
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20, 2025 • 40
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models Paper • 2507.14204 • Published Jul 14, 2025
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published Oct 16, 2025 • 15
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published Nov 24, 2025 • 33
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 114
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed Paper • 2512.14067 • Published 27 days ago • 13
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 4 days ago • 152
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 4 days ago • 152 • 6
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 4 days ago • 152