In the era of large foundation models, the quality of embeddings is a central determinant of downstream performance, yet widely used dense embeddings incur substantial costs in storage and latency. Recent works have proposed Contrastive Sparse Representation (CSR) to address this; the idea is to map dense embeddings into high-dimensional but sparse vectors to improve efficiency.
However, we find that CSR suffers severe degradation in the ultra-sparse regime (k = 2 or 4). We refer to this regime as ultra-sparse embeddings, which in principle can deliver over 100× efficiency gains in large-scale retrieval. However, existing methods incur 20 – 40% accuracy losses in this regime, rendering such embeddings impractical in real-world scenarios. This raises a central question:
Are ultra-sparse embeddings inherently constrained, or can proper training mitigate this?
Specifically, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. In essence, CSRv2 stabilizes sparsity learning through progressive k-annealing and enhances representational quality via supervised contrastive objectives. This approach ensures that the limited active dimensions capture the most discriminative semantic features, effectively reducing dead neurons from >80% to ~20%.
Notably, our method makes extreme sparsity practical without compromising performance, delivering a 14% accuracy gain at k=2 compared to prior methods. In terms of efficiency, CSRv2 yields up to 300x improvements in compute and memory efficiency relative to dense embeddings and achieves a 7x speedup over Matryoshka Representation Learning (MRL).
We evaluate CSRv2 across comprehensive benchmarks for both text and vision.
Text Embedding: On the MTEB benchmark, CSRv2 achieves a 14% accuracy gain over the
original CSR at k=2. Notably, CSRv2 with only 2 active dimensions matches the performance of
Matryoshka Representation Learning (MRL)
at 16 dimensions, offering comparable quality with significantly higher compression.
Visual Embedding: On ImageNet-1k, CSRv2 demonstrates superior robustness, achieving a
20% improvement in 1-NN accuracy over MRL and a 6% improvement over CSR in the ultra-sparse regime (k=2).
Performance on six text embedding tasks in MTEB with e5-Mistral-7b-instruct as backbone.
Performance on six text embedding tasks in MTEB with Qwen3-Embedding-4B as backbone.
Performance on ImageNet with FF2048 as backbone.
CSRv2 translates extreme sparsity into tangible computational efficiency. By utilizing sparse matrix operations native to modern hardware (e.g., Sparse Tensor Cores), CSRv2 delivers up to a 300x speedup in retrieval latency compared to full dense embeddings on a 1M-scale database. Even when compared to MRL's truncated dense embeddings of comparable accuracy, CSRv2 maintains a 7x speed advantage, making it a premier choice for large-scale, real-time, and edge-deployable search systems.
Efficiency analysis on 1M database size
Beyond standard retrieval, we evaluate CSRv2 on complex, knowledge-intensive tasks using the GraphRAG-Bench. In a zero-shot setting (no training on the target data), CSRv2 demonstrates superior generalization compared to MRL truncation. In the medical domain, CSRv2 achieves over 15% improvement in retrieval relevance and 10% improvement in generation accuracy compared to MRL, proving that ultra-sparse embeddings can retain the rich semantic nuances required for advanced RAG applications.
Retrieval evaluation on GraphRAG-Bench with Qwen3-Embedding-4B as backbone.
Generation evaluation on GraphRAG-Bench with Qwen3-Embedding-4B as backbone.
@inproceedings{guo26csrv2,
title={{CSR}v2: Unlocking Ultra-sparse Embeddings},
author={Lixuan Guo and Yifei Wang and Tiansheng Wen and Yifan Wang and Aosong Feng and Bo Chen and Stefanie Jegelka and Chenyu You},
year={2026},
booktitle={International Conference on Learning Representations (ICLR)},
}