Search results for "BITS"
2026-03-25
02:32

Google Releases TurboQuant Algorithm: 3-bit KV Cache Quantization With No Accuracy Loss, Inference Speed Boosted Up to 8x

Google Research has released the TurboQuant algorithm, which can compress the KV cache of large language models to 3 bits, reducing memory usage by at least 6 times while maintaining accuracy without requiring training. The algorithm optimizes traditional quantization through two sub-algorithms: PolarQuant and QJL. Testing shows excellent performance across multiple long-context benchmarks.
More