Detail View

CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

Title
CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX
Issued Date
2025-07
Citation
IEEE Computer Architecture Letters, v.24, no.2, pp.289 - 292
Type
Article
Author Keywords
AcceleratorApproximate Nearest Neighbor Search
ISSN
1556-6056
Abstract

Retrieval-augmented generation (RAG) systems increasingly rely on Approximate Nearest Neighbor Search (ANNS) to efficiently retrieve relevant context from billion-scale vector databases. While IVF-based ANNS frameworks scale well overall, the fine search stage remains a bottleneck due to its compute-intensive GEMV operations, particularly under large query volumes. To address this, we propose CABANA, a cluster-aware query batching for ANNS acceleration mechanism using Intel Advanced Matrix Extensions (AMX) that reformulates these GEMV computations into high-throughput GEMM operations. By aggregating queries targeting the same clusters, CABANA enables batched computation during fine search, significantly improving compute intensity and memory access regularity. Evaluations on billion-scale datasets show that CABANA outperforms traditional SIMD-based implementations, achieving up to 32.6× higher query throughput with minimal overhead, while maintaining high recall rates.

더보기
URI
https://scholar.dgist.ac.kr/handle/20.500.11750/60390
DOI
10.1109/LCA.2025.3596970
Publisher
Institute of Electrical and Electronics Engineers Inc.
Show Full Item Record

공유

qrcode
공유하기

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: