Detail View

Para-ksm: Parallelized Memory Deduplication with Data Streaming Accelerator

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

DC Field Value Language
dc.contributor.author Ji, Houxiang -
dc.contributor.author Kim, Minho -
dc.contributor.author Oh, Seonmu -
dc.contributor.author Kim, Daehoon -
dc.contributor.author Kim, Nam Sung -
dc.date.accessioned 2026-02-04T21:40:12Z -
dc.date.available 2026-02-04T21:40:12Z -
dc.date.created 2025-12-04 -
dc.date.issued 2025-07-09 -
dc.identifier.isbn 9781939133489 -
dc.identifier.uri https://scholar.dgist.ac.kr/handle/20.500.11750/59907 -
dc.description.abstract To tame the rapidly rising cost of memory in servers, hyperscalers have begun deploying memory deduplication features, such as Kernel Same-page Merging (ksm), for some of their services. Nonetheless, ksm incurs a datacenter tax significant enough to notably degrade performance of co-running applications, which hinders its wider and more aggressive deployment. Meanwhile, the server-class CPU has started to integrate various on-chip accelerators to effectively reduce datacenter taxes. One of such accelerators is Data Streaming Accelerator (DSA), which can offload the two most taxing functions of ksm, page comparison and checksum computation, from CPU. In this work, we demonstrate that ksm offloading these two functions to DSA (DSA-ksm) can reduce the performance degradation of co-running applications caused by ksm from 1.6-5.8x to 1.0-1.6x. However, we uncover that DSA-ksm, which naively replaces CPU-based functions with their DSA-based counterparts, yields significantly lower rates of memory deduplication than ksm due to the long latency of offloading these functions through on-chip PCIe. To address this shortcoming, we redesign ksm to exploit DSA's batching capability (Para-ksm). It facilitates a given function to operate on multiple pages per offload, rather than a single page as ksm does, thereby amortizing the long offloading latency. Compared to ksm, Para-ksm increases the amount of memory deduplication per CPU cycle used for ksm by 31-50% while decreasing the performance degradation to 1.3-2.7x. -
dc.language English -
dc.publisher USENIX Association -
dc.relation.ispartof Proceedings of the 2025 USENIX Annual Technical Conference -
dc.title Para-ksm: Parallelized Memory Deduplication with Data Streaming Accelerator -
dc.type Conference Paper -
dc.identifier.wosid 001575494400070 -
dc.identifier.scopusid 2-s2.0-105011629176 -
dc.identifier.bibliographicCitation USENIX Annual Technical Conference, pp.1197 - 1212 -
dc.identifier.url https://www.usenix.org/conference/atc25/presentation/ji -
dc.citation.conferenceDate 2025-07-07 -
dc.citation.conferencePlace US -
dc.citation.conferencePlace Boston -
dc.citation.endPage 1212 -
dc.citation.startPage 1197 -
dc.citation.title USENIX Annual Technical Conference -
Show Simple Item Record

File Downloads

공유

qrcode
공유하기

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: