Gyeongseo Park. (2024). Dynamic Core Allocation for Latency-Critical Workloads in Data-Center Servers. doi: 10.22677/THESIS.200000798061
Type
Thesis
Description
Data Centers, Network Packet Processing, Latency-Critical Application, Dynamic Core Management
Abstract
현대 데이터 센터는 지연에 민감한(Latency-Critical, LC) 애플리케이션의 서비스 수준 목표(Service Level Objectives, SLOs)를 달성하면서도 에너지 효율성과 성능을 개선해야 하는 복잡한 도전에 직면하고 있다. 기존 연구는 주로 LC 애플리케이션 스레드에만 초점을 맞추거나 네트워크 패킷 처리의 중요성을 고려하지 않는 한계가 있다. 이는 에너지 소비, SLO 위반 및 가상화에 따른 오버헤드 사이의 복잡한 상호작용을 충분히 고려하지 못한다는 문제점을 낳는다. 예를 들어, 기존 연구에서 네트워크 패킷 처리가 코어의 유휴 상태(C-states)에 미치는 영향을 간과하였고, 이는 응답 지연 및 에너지 효율성에 부정적인 영향을 미쳤다. 또한, 가상화 환경에서는 하이퍼바이저가 제공하는 병렬 패킷 처리 기술(예: multi queue virtio-net)로 인해 물리 CPU(pCPUs)간의 선점과 마이그레이션 빈도가 증가하여 심각한 성능 저하가 발생한다. 이에 본 연구에서는 두 가지 혁신적인 동적 코어 할당 전략인 CoreNap과 vSPACE를 제안한다. CoreNap은 네이티브 환경에서 LC 애플리케이션 스레드와 네트워크 패킷 처리에 할당된 코어 수를 동적으로 조절하여 유휴 코어를 최대화하고, 더 깊은 유휴 상태를 유지할 수 있도록 네트워크 패킷 처리 간격을 조정한다. 이를 위해, 경량화된 예측 모델을 이용하여 코어 설정에 따른 에너지 소비와 꼬리 지연 시간을 추정한 후, SLO를 위반하지 않는 범위에서 가장 에너지 효율적인 코어 설정을 선택한다. 실증적 평가를 통해 CoreNap이 기존 코어 할당 방법보다 에너지 절감 효과가 뛰어남을 입증하였다. vSPACE는 가상화 환경에서 병렬 네트워크 패킷 처리의 성능을 개선하기 위한 동적 코어 할당 전략이다. 이 전략은 온라인 통계 분석을 통해 pCPU 사용률과 SLO 위반의 상관관계를 분석하고, 스케줄링 경쟁을 완화하기 위해 가상 CPU(vCPUs)와 네트워크 큐(NQs)에 pCPUs를 별도로 할당하며, pCPU 사용률의 포화를 방지하는 휴리스틱 알고리즘을 적용하여 CPU 할당을 동적으로 조절한다. 성능, 에너지 효율성, 자원 효율성의 세 가지 모드에서 운영되는 vSPACE는 병렬 패킷 처리에 대한 기존 코어 할당 전략에 비해 처리량을 크게 향상하며, 에너지와 자원 효율성에서도 상당한 개선한다.|Modern data centers are facing a growing demand for energy efficiency while guaranteeing Service Level Objectives (SLO) for Latency-Critical (LC) applications. Prior core allocation studies often overlook the impact of network packet processing, resulting in energy/resource inefficiency and performance degradation. For instance, overlooking how packet processing influences core idle states (i.e., C-states), profoundly impacting response latency and energy consumption. In a virtualized environment, parallel packet processing exacerbates competition on processor cores, degrading performance. Addressing this oversight of network packet processing, this thesis presents two core allocation strategies: CoreNap for native environments and vSPACE for I/O virtualized environments. These strategies dynamically adjust the allocated cores for LC application and network packet processing, aiming to improve performance, energy efficiency, and resource efficiency while guaranteeing the target SLOs. CoreNap dynamically changes the number of cores allocated for LC application threads and network packet processing, which aims to extend the idle duration of cores for energy efficiency. Additionally, CoreNap adjusts the interval of network packet processing to enable cores involved in this packet processing to remain in deeper idle states for extended periods. Based on a lightweight predictive model, CoreNap estimates energy consumption and tail latency for various core management configurations, subsequently selecting the most energy-efficient configuration without violating SLOs. Our evaluation reveals that CoreNap surpasses existing core management approaches that focus solely on core allocation for LC application threads, achieving energy reduction across various load levels and applications without SLO violations. vSPACE is a dynamic core allocation strategy designed to enhance performance efficiency in virtualized environments supporting parallel packet processing. To moderate performance overheads induced by scheduling contentions, vSPACE allocates cores separately to virtual CPUs (vCPUs) and packet processing. Moreover, vSPACE dynamically adjusts the allocated cores for vCPUs and packet processing, employing a heuristic algorithm designed to prevent core utilization saturation. vSPACE identifies where the core utilization saturation occurs through an online statistical analysis, examining the correlation between SLO violations and core utilization. vSPACE operates in three distinct modes: performance, energy efficiency, and resource efficiency. Our evaluations indicate that vSPACE significantly enhances throughput (i.e., maximum Queries Per Second) compared to the existing approach for enhancing network I/O virtualization performance. Furthermore, vSPACE yields substantial improvements in both energy and resource efficiency compared to state-of-the-art dynamic core allocation. Keywords: Data Centers, Network Packet Processing, Latency-Critical Application, Dynamic Core Management
Table Of Contents
I. Introduction 1 1.1. Contributions 3 1.2. Organization 4 II. Background 6 2.1. Power Management with C-States 6 2.2. Network Packet Processing Techniques Supported by NICs 6 2.3. Parallel Network Packet Processing in Virtualization Environment 7 III. Related Work 9 3.1. Dynamic Core Management for Energy Efficiency 9 3.2. Idle Power Management in Processors 9 3.3. Dynamic Core Management for Improving Resource Utilization 9 3.4. Hardware-Assisted Approaches 10 3.5. Userspace Network 10 3.6. Scheduling Techniques 11 3.7. I/O Handling Techniques 11 3.8. Dynamic Resource Management 12 IV. CoreNap: Energy Efficient Core Management for Latency-Critical Applications 14 4.1. Core Management for Energy Efficiency 14 4.2. Impact of Core Management on Latency-Critical Applications 15 4.2.1. Core Allocation for Application Threads and Network Packet Processing 16 4.2.2. Network Packet Processing Interval Management with Core Allocation 17 4.3. Energy Efficient Core Management for Latency-Critical Applications 21 4.3.1. Challenges 21 4.3.2. CoreNap Architecture 22 4.3.3. Exploration Process 23 4.3.4. Online Training 26 4.3.5. Implementation 26 4.4. Evaluation 27 4.4.1. Experimental Methodology 27 4.4.2. Energy Efficiency and Performance 28 4.4.3. Scalability Evaluation 32 4.4.4. Energy Efficiency and Performance with Dynamic Load 32 4.4.5. Evaluation of Prediction Performance 33 4.4.6. Online Training Efficacy under Spike Load 35 4.4.7. Overhead Analysis of CoreNap 36 4.5. Discussion About Core Placement 37 4.6. Summary 38 V. vSPACE: Supporting Parallel Network Packet Processing in Virtualized Environments through Dynamic Core Management 39 5.1. Parallel Network Packet Processing in Virtualized Environments 39 5.2. Impact of Parallel Packet Processing with I/O Virtualization 40 5.2.1. Impact of Parallel Packet Processing on LC Workload 41 5.2.2. Impact of Core Allocation and Parallel Packet Processing on LC Workload 43 5.3. Architecture 45 5.3.1. Overview 45 5.3.2. Dynamic Core Allocation 46 5.3.3. Exploration of Thresholds 48 5.3.4. Implementation 49 5.4. Evaluation 50 5.4.1. Experimental Methodology 50 5.4.2. Comparison with Static Loads 52 5.4.3. Comparison with Dynamic Loads 56 5.5. Summary 57 VI. Conclusion 60 References 62 요약문 71