Cited time in webofscience Cited time in scopus

Full metadata record

DC Field Value Language
dc.contributor.author Park, Jongho -
dc.contributor.author Kwon, Hyuk-jun -
dc.contributor.author Kim, Seowoo -
dc.contributor.author Lee, Junyoung -
dc.contributor.author Ha, Minho -
dc.contributor.author Lim, Euicheol -
dc.contributor.author Imani, Mohsen -
dc.contributor.author Kim, Yeseong -
dc.date.accessioned 2023-12-26T18:13:00Z -
dc.date.available 2023-12-26T18:13:00Z -
dc.date.created 2022-09-23 -
dc.date.issued 2022-07-14 -
dc.identifier.isbn 9781450391429 -
dc.identifier.issn 0738-100X -
dc.identifier.uri http://hdl.handle.net/20.500.11750/46820 -
dc.description.abstract We have seen many successful deployments of deep learning accelerator designs on different platforms and technologies, e.g., FPGA, ASIC, and Processing In-Memory platforms. However, the size of the deep learning models keeps increasing, making computations a burden on the accelerators. A naive approach to resolve this issue is to design larger accelerators; however, it is not scalable due to high resource requirements, e.g., power consumption and off-chip memory sizes. A promising solution is to utilize multiple accelerators and use them as needed, similar to conventional multiprocessing. For example, for smaller networks, we may use a single accelerator, while we may use multiple accelerators with proper network partitioning for larger networks. However, partitioning DNN models into multiple parts leads to large communication overheads due to inter-layer communications. In this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using an autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted on state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency. Our solution increases performance and energy efficiency by up to 30.5% and 28.4% with minimal accuracy loss as compared to running the same model on pipelined multi-block accelerators without the autoencoder. © 2022 ACM. -
dc.language English -
dc.publisher Association for Computing Machinery -
dc.title QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning -
dc.type Conference Paper -
dc.identifier.doi 10.1145/3489517.3530589 -
dc.identifier.scopusid 2-s2.0-85137456792 -
dc.identifier.bibliographicCitation Design Automation Conference, pp.1159 - 1164 -
dc.identifier.url https://59dac.conference-program.com/presentation/?id=RESEARCH505&sess=sess130 -
dc.citation.conferencePlace US -
dc.citation.conferencePlace San Francisco -
dc.citation.endPage 1164 -
dc.citation.startPage 1159 -
dc.citation.title Design Automation Conference -
Files in This Item:

There are no files associated with this item.

Appears in Collections:
Department of Electrical Engineering and Computer Science Computation Efficient Learning Lab. 2. Conference Papers

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE