DGIST Scholar: A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture

Detail View

Department of Electrical Engineering and Computer Science Intelligent Integrated Circuits and Systems Lab 1. Journal Articles

A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Excel

DC Field	Value	Language
dc.contributor.author	Jung, Sangwoo	-
dc.contributor.author	Lee, Jaehyun	-
dc.contributor.author	Park, Dahoon	-
dc.contributor.author	Lee, Youngjoo	-
dc.contributor.author	Yoon, Jong-Hyeok	-
dc.contributor.author	Kung, Jaeha	-
dc.date.accessioned	2024-11-01T18:10:17Z	-
dc.date.available	2024-11-01T18:10:17Z	-
dc.date.created	2024-05-27	-
dc.date.issued	2024-12	-
dc.identifier.issn	1549-8328	-
dc.identifier.uri	http://hdl.handle.net/20.500.11750/57100	-
dc.description.abstract	In this article, we present an energy-scalable CNN model that can adapt to different hardware resource constraints. Specifically, we propose a dual-precision network, named DualNet, that leverages two independent bit-precision paths (INT4 and ternary-binary). DualNet achieves both high accuracy and low complexity by balancing the ratio between two paths. We also present an evolutionary algorithm that allows the automatic search of the optimal ratios. In addition to the novel CNN architecture design, we develop a heterogeneous processing-in-memory (PIM) hardware that integrates SRAM-and eDRAM-based PIMs to efficiently compute two precision paths in parallel. To verify the energy efficiency of DualNet computed on the heterogeneous PIM, we prototyped a test chip in 28nm CMOS technology. To maximize the hardware efficiency, we utilize an improved data mapping scheme achieving the most effective deployment of DualNets on multiple PIM arrays. With the proposed SW-HW co-optimization, we can obtain the most energy-efficient DualNet model operating on the actual PIM hardware. Compared to the other quantized networks with a single bit-precision, DualNet reduces the energy consumption, memory footprint, and latency by 29.0%, 49.5%, 47.3% on average, respectively, for CIFAR-10/100 and ImageNet datasets. IEEE	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture	-
dc.type	Article	-
dc.identifier.doi	10.1109/TCSI.2024.3395842	-
dc.identifier.wosid	001226144000001	-
dc.identifier.scopusid	2-s2.0-85193248367	-
dc.identifier.bibliographicCitation	Jung, Sangwoo. (2024-12). A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture. IEEE Transactions on Circuits and Systems I: Regular Papers, 71(12), 5546–5559. doi: 10.1109/TCSI.2024.3395842	-
dc.description.isOpenAccess	FALSE	-
dc.subject.keywordAuthor	Convolutional neural networks	-
dc.subject.keywordAuthor	deep learning	-
dc.subject.keywordAuthor	Hardware	-
dc.subject.keywordAuthor	Memory management	-
dc.subject.keywordAuthor	mixed-precision quantization	-
dc.subject.keywordAuthor	processing-in-memory	-
dc.subject.keywordAuthor	Quantization (signal)	-
dc.subject.keywordAuthor	Random access memory	-
dc.subject.keywordAuthor	SW-HW co-optimization	-
dc.subject.keywordAuthor	Computational modeling	-
dc.subject.keywordAuthor	Convolution	-
dc.citation.endPage	5559	-
dc.citation.number	12	-
dc.citation.startPage	5546	-
dc.citation.title	IEEE Transactions on Circuits and Systems I: Regular Papers	-
dc.citation.volume	71	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.type.docType	Article	-

Show Simple Item Record

File Downloads

There are no files associated with this item.

Yoon, Jong-Hyeok윤종혁: Department of Electrical Engineering and Computer Science

Detail View

A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture

File Downloads

공유

Related Researcher

Total Views & Downloads