AI Summary • Published on Nov 18, 2025
The impact of quantization on reinforcement learning (RL) in large reasoning models (LRMs) remains unclear, particularly when compared to fine-tuning scenarios where post-training quantization (PTQ) and quantization-aware training (QAT) are well-studied.
The authors conducted systematic experiments using GRPO and drGRPO algorithms to fine-tune Qwen3 base models on various math benchmarks, evaluating different quantization strategies including PTQ, QAT/QAFT, and QLoRA.
The study found that quantization-aware RL training negatively impacted learning, while PTQ and QLoRA resulted in better performance, with QLoRA achieving the lowest quantization error in most cases, even at 4-bit precision.
The findings suggest that techniques like PTQ and QLoRA are effective for preserving reasoning ability in LRMs, even at low bit precision, and that sudden quantization during RL is detrimental to learning, highlighting the need for careful quantization strategy selection.