Detail View

CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

DC Field Value Language
dc.contributor.author Kim, Minho -
dc.contributor.author Ji, Houxiang -
dc.contributor.author Kang, Jaeyoung -
dc.contributor.author Lee, Hwanjun -
dc.contributor.author Kim, Daehoon -
dc.contributor.author Kim, Nam Sung -
dc.date.accessioned 2026-06-01T16:40:12Z -
dc.date.available 2026-06-01T16:40:12Z -
dc.date.created 2025-08-22 -
dc.date.issued 2025-07 -
dc.identifier.issn 1556-6056 -
dc.identifier.uri https://scholar.dgist.ac.kr/handle/20.500.11750/60390 -
dc.description.abstract Retrieval-augmented generation (RAG) systems increasingly rely on Approximate Nearest Neighbor Search (ANNS) to efficiently retrieve relevant context from billion-scale vector databases. While IVF-based ANNS frameworks scale well overall, the fine search stage remains a bottleneck due to its compute-intensive GEMV operations, particularly under large query volumes. To address this, we propose CABANA, a cluster-aware query batching for ANNS acceleration mechanism using Intel Advanced Matrix Extensions (AMX) that reformulates these GEMV computations into high-throughput GEMM operations. By aggregating queries targeting the same clusters, CABANA enables batched computation during fine search, significantly improving compute intensity and memory access regularity. Evaluations on billion-scale datasets show that CABANA outperforms traditional SIMD-based implementations, achieving up to 32.6× higher query throughput with minimal overhead, while maintaining high recall rates. -
dc.language English -
dc.publisher Institute of Electrical and Electronics Engineers Inc. -
dc.title CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX -
dc.type Article -
dc.identifier.doi 10.1109/LCA.2025.3596970 -
dc.identifier.wosid 001569707200001 -
dc.identifier.scopusid 2-s2.0-105012958803 -
dc.identifier.bibliographicCitation IEEE Computer Architecture Letters, v.24, no.2, pp.289 - 292 -
dc.description.isOpenAccess TRUE -
dc.subject.keywordAuthor Accelerator -
dc.subject.keywordAuthor Approximate Nearest Neighbor Search -
dc.citation.endPage 292 -
dc.citation.number 2 -
dc.citation.startPage 289 -
dc.citation.title IEEE Computer Architecture Letters -
dc.citation.volume 24 -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.relation.journalResearchArea Computer Science -
dc.relation.journalWebOfScienceCategory Computer Science, Hardware & Architecture -
dc.type.docType Article -
Show Simple Item Record

공유

qrcode
공유하기

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: