Detail View
CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX
WEB OF SCIENCE
SCOPUS
- Title
- CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX
- Issued Date
- 2025-07
- Citation
- IEEE Computer Architecture Letters, v.24, no.2, pp.289 - 292
- Type
- Article
- Author Keywords
- Accelerator ; Approximate Nearest Neighbor Search
- ISSN
- 1556-6056
- Abstract
-
Retrieval-augmented generation (RAG) systems increasingly rely on Approximate Nearest Neighbor Search (ANNS) to efficiently retrieve relevant context from billion-scale vector databases. While IVF-based ANNS frameworks scale well overall, the fine search stage remains a bottleneck due to its compute-intensive GEMV operations, particularly under large query volumes. To address this, we propose CABANA, a cluster-aware query batching for ANNS acceleration mechanism using Intel Advanced Matrix Extensions (AMX) that reformulates these GEMV computations into high-throughput GEMM operations. By aggregating queries targeting the same clusters, CABANA enables batched computation during fine search, significantly improving compute intensity and memory access regularity. Evaluations on billion-scale datasets show that CABANA outperforms traditional SIMD-based implementations, achieving up to 32.6× higher query throughput with minimal overhead, while maintaining high recall rates.
더보기
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
File Downloads
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
