DGIST Scholar: QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning

Department of Electrical Engineering and Computer Science Computation Efficient Learning Lab. 2. Conference Papers

Cited time in webofscience

Cited time in scopus

QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning

Title: QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning

Author(s): Park, Jongho ; Kwon, Hyuk-jun ; Kim, Seowoo ; Lee, Junyoung ; Ha, Minho ; Lim, Euicheol ; Imani, Mohsen ; Kim, Yeseong

Issued Date: 2022-07-14

Citation: Design Automation Conference, pp.1159 - 1164

Type: Conference Paper

ISBN: 9781450391429

ISSN: 0738-100X

Abstract: We have seen many successful deployments of deep learning accelerator designs on different platforms and technologies, e.g., FPGA, ASIC, and Processing In-Memory platforms. However, the size of the deep learning models keeps increasing, making computations a burden on the accelerators. A naive approach to resolve this issue is to design larger accelerators; however, it is not scalable due to high resource requirements, e.g., power consumption and off-chip memory sizes. A promising solution is to utilize multiple accelerators and use them as needed, similar to conventional multiprocessing. For example, for smaller networks, we may use a single accelerator, while we may use multiple accelerators with proper network partitioning for larger networks. However, partitioning DNN models into multiple parts leads to large communication overheads due to inter-layer communications. In this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using an autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted on state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency. Our solution increases performance and energy efficiency by up to 30.5% and 28.4% with minimal accuracy loss as compared to running the same model on pipelined multi-block accelerators without the autoencoder. © 2022 ACM.

URI: http://hdl.handle.net/20.500.11750/46820

DOI: 10.1145/3489517.3530589

Publisher: Association for Computing Machinery

Related Researcher

Kim, Yeseong
Research Interests Embedded Systems for Edge Intelligence; Brain-Inspired HD Computing for AI; In-Memory Computing

Files in This Item:: There are no files associated with this item.

Appears in Collections:: Department of Electrical Engineering and Computer Science Computation Efficient Learning Lab. 2. Conference Papers

Show Full Item Record

qrcode

DGIST

DGIST Scholar was built with support from the OAK distribution project by the National Library of Korea.

You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Library Services Team, DGIST 333. Techno Jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 42988, Republic of Korea.

DGIST Library Repository

BROWSE

DGIST

BROWSE