Detail View
Real-Time Scheduling Framework for Multi-DNN Inference
Citations
WEB OF SCIENCE
Citations
SCOPUS
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | 좌훈승 | - |
| dc.contributor.author | Woosung Kang | - |
| dc.date.accessioned | 2026-01-23T10:54:18Z | - |
| dc.date.available | 2026-01-23T10:54:18Z | - |
| dc.date.issued | 2026 | - |
| dc.identifier.uri | https://scholar.dgist.ac.kr/handle/20.500.11750/59627 | - |
| dc.identifier.uri | http://dgist.dcollection.net/common/orgView/200000946116 | - |
| dc.description | Real-Time AI System, Multi-DNN Inference, Real-Time Scheduling | - |
| dc.description.abstract | Deep Neural Networks (DNNs) are increasingly central to real-time AI systems, particularly in safety-critical domains such as autonomous vehicles and medical diagnostics, where both timing guarantees and high inference accuracy are essential. However, DNN workloads present significant challenges: they involve a trade-off between execution time and accuracy, require intensive computation and memory resources, and must operate under strict constraints of processing capacity, memory, and energy. This dissertation proposes a real-time scheduling framework for multi-DNN inference that addresses these challenges by modeling DNN behavior across heterogeneous resources, integrating system-wide timing mechanisms to ensure predictability, and providing transparent management of both resources and DNN tasks. By unifying system-level allocation with task-level scheduling, the framework enables efficient resource sharing while maintaining timing guarantees and accuracy. The contributions of this dissertation advance the foundation of real-time AI systems, offering practical methods for deploying complex DNN workloads with predictable performance and reliability in high-stakes environments.|딥 뉴럴 네트워크 (Deep Neural Networks, DNN)는 실시간 인공지능 시스템의 핵심 구성 요소로 자리 잡고 있으며, 특히 자율주행차나 의료 진단과 같은 안전 필수 분야 에서는 높은 추론 정확도와 더불어 엄격한 시간 제약이 필수적이다. 그러나 DNN 기반 워크로드는 실행 시간과 정확도 간의 트레이드오프를 내포하고 있으며, 막대한 연산 및 메모리 자원을 요구하고, 한정된 처리 성능·메모리·에너지 제약 하에서 동작해야 한다는 점에서 새로운 도전 과제를 제시한다. 본 논문은 이러한 문제를 해결하기 위해, 다중 DNN 추론 환경을 대상으로 한 실시간 스케줄링 프레임워크를 제안한다. 제안된 프레임워크는 이기종 자원 환경에서의 DNN 동작 특성을 정밀하게 모델링하고, 시스템 전반의 시간 제약사항을 통합적으로 관리함 으로써 예측 가능한 수행을 보장하며, 자원과 DNN 태스크 모두를 투명하게 관리한다. 또한 시스템 수준의 자원 할당과 태스크 수준의 스케줄링을 통합함으로써 자원 공유의 효율성을 극대화하면서도 시간 제약과 정확도를 동시에 만족시킨다. 이 연구의 기여는 복잡한 DNN 워크로드를 자원 제약적이고 안전 필수적인 환경에서 도 신뢰성과 예측 가능한 성능으로 실행할 수 있는 실시간 AI 시스템의 기반을 확립하는 데 있다. 이를 통해 실제 응용 가능한 실시간 AI 시스템 구현을 위한 실질적인 방법론을 제시한다. |
- |
| dc.description.tableofcontents | 1 Introduction 1 1.1 Requirements of Real-Time AI Systems 1 1.2 Characteristics of Real-Time AI Systems 2 1.2.1 Task Perspective 3 1.2.2 Resource Perspective 4 1.3 Challenges of Real-Time AI Systems 5 1.3.1 System-Wide Consideration of the Computation-Accuracy Trade-off 5 1.3.2 Effective Utilization of Heterogeneous Underlying Resources 6 1.3.3 The Emergence of Memory as a Primary Bottleneck 7 1.4 Thesis Statement and Contributions 7 1.4.1 Heterogeneous Resource Utilization for Real-Time DNN Inference 9 1.4.2 Real-Time GPU Memory Management for Multi-DNN Inference 10 1.4.3 Efficient Memory Management for Real-Time Multi-DNN Inference 12 1.5 Outline 13 2 Background 14 2.1 Utilizing Heterogeneous Resources for DNN Inference 14 2.2 GPU Memory Management for DNN Tasks 15 2.2.1 Discrete GPU Systems 15 2.2.2 Integrated GPU Systems 16 3 LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks 18 3.1 Introduction 18 3.2 Motivation 22 3.2.1 CPU vs GPU Performance Imbalance in DNN Execution 22 3.2.2 CPU-friendly Quantization 23 3.3 LaLaRAND 27 3.3.1 System Goal and Overview 27 3.3.2 Design of LaLaRAND 29 3.4 Layer-by-Layer Resource Allocation 33 3.4.1 System Model and Notations 33 3.4.2 Schedulability Analysis and Allocation Algorithm 35 3.5 Runtime Layer Migration 42 3.6 Evaluation 48 3.6.1 Experimental Setup 48 3.6.2 Experimental Results 51 3.7 Summary 58 4 RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference 59 4.1 Introduction 59 4.2 Background 63 4.2.1 Target System 63 4.2.2 Unified Memory and On-demand Paging 65 4.2.3 GPU Virtual Memory Management 65 4.3 Design Principle 66 4.4 System Design 69 4.4.1 RT-Swap Library 69 4.4.2 RT-Swap Scheduler 76 4.5 Swap-Aware Real-Time Scheduling 77 4.5.1 Task Model 77 4.5.2 Target Scheduling Problems 79 4.5.3 Swap Schedule Generation 80 4.5.4 Swap Volume Assignment 84 4.6 Implementation 87 4.7 Evaluation 89 4.7.1 Experimental Setup 89 4.7.2 Extensive Simulations 90 4.7.3 Runtime Experiments 93 4.7.4 Runtime Overhead Analysis 96 4.8 Summary 96 5 ZeroSwap: Toward Swapless Real-Time Multi-DNN Inference via SSD-based GPU Memory Extension 98 5.1 Introduction 98 5.2 Background 102 5.2.1 Memory Swapping in Integrated GPU Systems 102 5.2.2 GPU Virtual Memory Management 103 5.3 Swap Overhead Analysis 104 5.4 Design Principles of ZeroSwap 105 5.4.1 Semantic-Aware Selective Swapping 106 5.4.2 Shared Pinned Allocation 108 5.4.3 Segment-Level Overlapping 110 5.5 System Design 111 5.5.1 Shared Pinned Allocator 112 5.5.2 Semantic-Aware Selective Swapper 116 5.5.3 Overlap-Aware Scheduler 118 5.6 Segment-level Overlapping Decision 119 5.6.1 System Model 120 5.6.2 Segment-Level Overlapping Model 121 5.6.3 Segment Boundary Decision 123 5.6.4 Sharing Volume Assignment 124 5.7 Evaluation 126 5.7.1 Experiment Setup 126 5.7.2 Simulation Results 128 5.7.3 Runtime Experiments 131 5.7.4 Overhead Analysis 134 5.8 Summary 135 6 Conclusion and Future Work 136 6.1 Limitations 136 6.2 Future Work 136 6.3 Summary 137 References 139 |
- |
| dc.format.extent | 151 | - |
| dc.language | eng | - |
| dc.publisher | DGIST | - |
| dc.title | Real-Time Scheduling Framework for Multi-DNN Inference | - |
| dc.title.alternative | 다중 DNN 추론을 위한 실시간 스케줄링 프레임워크 | - |
| dc.type | Thesis | - |
| dc.identifier.doi | 10.22677/THESIS.200000946116 | - |
| dc.description.degree | Doctor | - |
| dc.contributor.department | Department of Electrical Engineering and Computer Science | - |
| dc.contributor.coadvisor | Yeseong Kim | - |
| dc.date.awarded | 2026-02-01 | - |
| dc.publisher.location | Daegu | - |
| dc.description.database | dCollection | - |
| dc.citation | XT.ID 강66 202602 | - |
| dc.date.accepted | 2026-01-19 | - |
| dc.contributor.alternativeDepartment | 전기전자컴퓨터공학과 | - |
| dc.subject.keyword | Real-Time AI System, Multi-DNN Inference, Real-Time Scheduling | - |
| dc.contributor.affiliatedAuthor | Woosung Kang | - |
| dc.contributor.affiliatedAuthor | Hoon Sung Chwa | - |
| dc.contributor.affiliatedAuthor | Yeseong Kim | - |
| dc.contributor.alternativeName | 강우성 | - |
| dc.contributor.alternativeName | Hoon Sung Chwa | - |
| dc.contributor.alternativeName | 김예성 | - |
| dc.rights.embargoReleaseDate | 2027-02-28 | - |
File Downloads
- There are no files associated with this item.
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
