WEB OF SCIENCE
SCOPUS
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Park, Jongho | - |
| dc.contributor.author | Kwon, Hyuk-jun | - |
| dc.contributor.author | Kim, Seowoo | - |
| dc.contributor.author | Lee, Junyoung | - |
| dc.contributor.author | Ha, Minho | - |
| dc.contributor.author | Lim, Euicheol | - |
| dc.contributor.author | Imani, Mohsen | - |
| dc.contributor.author | Kim, Yeseong | - |
| dc.date.accessioned | 2023-12-26T18:13:00Z | - |
| dc.date.available | 2023-12-26T18:13:00Z | - |
| dc.date.created | 2022-09-23 | - |
| dc.date.issued | 2022-07-14 | - |
| dc.identifier.isbn | 9781450391429 | - |
| dc.identifier.issn | 0738-100X | - |
| dc.identifier.uri | http://hdl.handle.net/20.500.11750/46820 | - |
| dc.description.abstract | We have seen many successful deployments of deep learning accelerator designs on different platforms and technologies, e.g., FPGA, ASIC, and Processing In-Memory platforms. However, the size of the deep learning models keeps increasing, making computations a burden on the accelerators. A naive approach to resolve this issue is to design larger accelerators; however, it is not scalable due to high resource requirements, e.g., power consumption and off-chip memory sizes. A promising solution is to utilize multiple accelerators and use them as needed, similar to conventional multiprocessing. For example, for smaller networks, we may use a single accelerator, while we may use multiple accelerators with proper network partitioning for larger networks. However, partitioning DNN models into multiple parts leads to large communication overheads due to inter-layer communications. In this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using an autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted on state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency. Our solution increases performance and energy efficiency by up to 30.5% and 28.4% with minimal accuracy loss as compared to running the same model on pipelined multi-block accelerators without the autoencoder. © 2022 ACM. | - |
| dc.language | English | - |
| dc.publisher | Association for Computing Machinery | - |
| dc.relation.ispartof | Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC) (DAC ’22) | - |
| dc.title | QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning | - |
| dc.type | Conference Paper | - |
| dc.identifier.doi | 10.1145/3489517.3530589 | - |
| dc.identifier.wosid | 001041471300193 | - |
| dc.identifier.scopusid | 2-s2.0-85137456792 | - |
| dc.identifier.bibliographicCitation | Park, Jongho. (2022-07-14). QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning. Design Automation Conference, 1159–1164. doi: 10.1145/3489517.3530589 | - |
| dc.identifier.url | https://59dac.conference-program.com/presentation/?id=RESEARCH505&sess=sess130 | - |
| dc.citation.conferenceDate | 2022-07-10 | - |
| dc.citation.conferencePlace | US | - |
| dc.citation.conferencePlace | San Francisco | - |
| dc.citation.endPage | 1164 | - |
| dc.citation.startPage | 1159 | - |
| dc.citation.title | Design Automation Conference | - |
Department of Electrical Engineering and Computer Science