News | Gate.com

02:36

Google TurboQuant: 3-bit Quantized KV Cache with Zero Precision Loss, Inference Speed Up to 8x Faster

Google Research has released the TurboQuant quantization compression algorithm, which can compress the KV cache of large language models to 3 bits, reducing memory usage by 6x and improving computational speed by 8x. The algorithm demonstrates excellent performance across multiple benchmark tests and aims to address model caching bottlenecks. It will be published at ICLR 2026.

02:32

Google Releases TurboQuant Algorithm: 3-bit KV Cache Quantization With No Accuracy Loss, Inference Speed Boosted Up to 8x

Industry Reports

Google Research has released the TurboQuant algorithm, which can compress the KV cache of large language models to 3 bits, reducing memory usage by at least 6 times while maintaining accuracy without requiring training. The algorithm optimizes traditional quantization through two sub-algorithms: PolarQuant and QJL. Testing shows excellent performance across multiple long-context benchmarks.