Detail View
CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX
Citations
WEB OF SCIENCE
Citations
SCOPUS
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Minho | - |
| dc.contributor.author | Ji, Houxiang | - |
| dc.contributor.author | Kang, Jaeyoung | - |
| dc.contributor.author | Lee, Hwanjun | - |
| dc.contributor.author | Kim, Daehoon | - |
| dc.contributor.author | Kim, Nam Sung | - |
| dc.date.accessioned | 2026-06-01T16:40:12Z | - |
| dc.date.available | 2026-06-01T16:40:12Z | - |
| dc.date.created | 2025-08-22 | - |
| dc.date.issued | 2025-07 | - |
| dc.identifier.issn | 1556-6056 | - |
| dc.identifier.uri | https://scholar.dgist.ac.kr/handle/20.500.11750/60390 | - |
| dc.description.abstract | Retrieval-augmented generation (RAG) systems increasingly rely on Approximate Nearest Neighbor Search (ANNS) to efficiently retrieve relevant context from billion-scale vector databases. While IVF-based ANNS frameworks scale well overall, the fine search stage remains a bottleneck due to its compute-intensive GEMV operations, particularly under large query volumes. To address this, we propose CABANA, a cluster-aware query batching for ANNS acceleration mechanism using Intel Advanced Matrix Extensions (AMX) that reformulates these GEMV computations into high-throughput GEMM operations. By aggregating queries targeting the same clusters, CABANA enables batched computation during fine search, significantly improving compute intensity and memory access regularity. Evaluations on billion-scale datasets show that CABANA outperforms traditional SIMD-based implementations, achieving up to 32.6× higher query throughput with minimal overhead, while maintaining high recall rates. | - |
| dc.language | English | - |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
| dc.title | CABANA: Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS with Intel® AMX | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/LCA.2025.3596970 | - |
| dc.identifier.wosid | 001569707200001 | - |
| dc.identifier.scopusid | 2-s2.0-105012958803 | - |
| dc.identifier.bibliographicCitation | IEEE Computer Architecture Letters, v.24, no.2, pp.289 - 292 | - |
| dc.description.isOpenAccess | TRUE | - |
| dc.subject.keywordAuthor | Accelerator | - |
| dc.subject.keywordAuthor | Approximate Nearest Neighbor Search | - |
| dc.citation.endPage | 292 | - |
| dc.citation.number | 2 | - |
| dc.citation.startPage | 289 | - |
| dc.citation.title | IEEE Computer Architecture Letters | - |
| dc.citation.volume | 24 | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Hardware & Architecture | - |
| dc.type.docType | Article | - |
File Downloads
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
