Detail View

RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference
Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

DC Field Value Language
dc.contributor.author Kang, Woosung -
dc.contributor.author Lee, Jinkyu -
dc.contributor.author Lee, Youngmoon -
dc.contributor.author Oh, Sangeun -
dc.contributor.author Lee, Kilho -
dc.contributor.author Chwa, Hoon Sung -
dc.date.accessioned 2025-01-20T19:40:14Z -
dc.date.available 2025-01-20T19:40:14Z -
dc.date.created 2024-06-27 -
dc.date.issued 2024-05-15 -
dc.identifier.isbn 9798350358414 -
dc.identifier.issn 1545-3421 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/57553 -
dc.description.abstract The increasing complexity and memory demands of Deep Neural Networks (DNNs) for real-Time systems pose new significant challenges, one of which is the GPU memory capacity bottleneck, where the limited physical memory inside GPUs impedes the deployment of sophisticated DNN models. This paper presents, to the best of our knowledge, the first study of addressing the GPU memory bottleneck issues, while simultaneously ensuring the timely inference of multiple DNN tasks. We propose RT-Swap, a real-Time memory management framework, that enables transparent and efficient swap scheduling of memory objects, employing the relatively larger CPU memory to extend the available GPU memory capacity, without compromising timing guarantees. We have implemented RT-Swap on top of representative machine-learning frameworks, demonstrating its effectiveness in making significantly more DNN task sets schedulable at least 72% over existing approaches even when the task sets demand up to 96.2% more memory than the GPU's physical capacity. © 2024 IEEE. -
dc.language English -
dc.publisher IEEE Computer Society -
dc.relation.ispartof Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS -
dc.title RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference -
dc.type Conference Paper -
dc.identifier.doi 10.1109/RTAS61025.2024.00037 -
dc.identifier.wosid 001261354500034 -
dc.identifier.scopusid 2-s2.0-85197687077 -
dc.identifier.bibliographicCitation Kang, Woosung. (2024-05-15). RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference. IEEE Real-Time and Embedded Technology and Applications Symposium, 373–385. doi: 10.1109/RTAS61025.2024.00037 -
dc.identifier.url https://2024.rtas.org/program/ -
dc.citation.conferenceDate 2024-05-13 -
dc.citation.conferencePlace HK -
dc.citation.conferencePlace HongKong -
dc.citation.endPage 385 -
dc.citation.startPage 373 -
dc.citation.title IEEE Real-Time and Embedded Technology and Applications Symposium -
Show Simple Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Related Researcher

좌훈승
Chwa, Hoonsung좌훈승

Department of Electrical Engineering and Computer Science

read more

Total Views & Downloads