DGIST Scholar: LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks

Detail View

Department of Electrical Engineering and Computer Science Real-Time Computing Lab 2. Conference Papers

LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Excel

DC Field	Value	Language
dc.contributor.author	Kang, Woosung	-
dc.contributor.author	Lee, Kilho	-
dc.contributor.author	Lee, Jinkyu	-
dc.contributor.author	Shin, Insik	-
dc.contributor.author	Chwa, Hoon Sung	-
dc.date.accessioned	2023-12-26T18:42:47Z	-
dc.date.available	2023-12-26T18:42:47Z	-
dc.date.created	2022-01-19	-
dc.date.issued	2021-12-09	-
dc.identifier.isbn	9781665428026	-
dc.identifier.issn	2576-3172	-
dc.identifier.uri	http://hdl.handle.net/20.500.11750/46881	-
dc.description.abstract	Deep neural networks (DNNs) have shown remarkable success in various machine-learning (ML) tasks useful for many safety-critical, real-time embedded systems. The foremost design goal for enabling DNN execution on real-time embedded systems is to provide worst-case timing guarantees with limited computing resources. Yet, the state-of-the-art ML frameworks hardly leverage heterogeneous computing resources (i.e., CPU, GPU) to improve the schedulability of real-time DNN tasks due to several factors, which include a coarse-grained resource allocation model (one-resource-per-task), the asymmetric nature of DNN execution on CPU and GPU, and lack of schedulability-aware CPU/GPU allocation scheme. This paper presents, to the best of our knowledge, the first study of addressing the above three major barriers and examining their cooperative effect on schedulability improvement. In this paper, we propose LaLaRAND, a real-time layer-level DNN scheduling framework, that enables flexible CPU/GPU scheduling of individual DNN layers by tightly coupling CPU-friendly quantization with fine-grained CPU/GPU allocation schemes (one-resource-per-layer) while mitigating accuracy loss without compromising timing guarantees. We have implemented and evaluated LaLaRAND on top of the state-of-the-art ML framework to demonstrate its effectiveness in making more DNN task sets schedulable by 56% and 80% over an existing approach and a baseline (vanilla PyTorch), respectively, with only up to -0.4% of performance (inference accuracy) difference.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks	-
dc.type	Conference Paper	-
dc.identifier.doi	10.1109/RTSS52674.2021.00038	-
dc.identifier.scopusid	2-s2.0-85124555049	-
dc.identifier.bibliographicCitation	Kang, Woosung. (2021-12-09). LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks. IEEE Real-Time Systems Symposium, 329–341. doi: 10.1109/RTSS52674.2021.00038	-
dc.identifier.url	http://2021.rtss.org/conferenceformat/	-
dc.citation.conferencePlace	GE	-
dc.citation.conferencePlace	Dortmund	-
dc.citation.endPage	341	-
dc.citation.startPage	329	-
dc.citation.title	IEEE Real-Time Systems Symposium	-

Show Simple Item Record

File Downloads

There are no files associated with this item.

Chwa, Hoonsung좌훈승: Department of Electrical Engineering and Computer Science

Detail View

LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks

File Downloads

공유

Related Researcher

Total Views & Downloads