DGIST Scholar: InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product Processing

Department of Electrical Engineering and Computer Science Computer Architecture and Systems Lab 2. Conference Papers

Cited time in webofscience

Cited time in scopus

Full metadata record

DC Field	Value	Language
dc.contributor.author	Baek, Daehyeon	-
dc.contributor.author	Hwang, Soojin	-
dc.contributor.author	Heo, Taekyung	-
dc.contributor.author	Kim, Daehoon	-
dc.contributor.author	Huh, Jaehyuk	-
dc.date.accessioned	2023-12-26T18:43:32Z	-
dc.date.available	2023-12-26T18:43:32Z	-
dc.date.created	2022-01-24	-
dc.date.issued	2021-09-28	-
dc.identifier.isbn	9781665442787	-
dc.identifier.issn	1089-795X	-
dc.identifier.uri	http://hdl.handle.net/20.500.11750/46905	-
dc.description.abstract	Sparse matrix multiplication is one of the key computational kernels in large-scale data analytics. However, a naive implementation suffers from the overheads of irregular memory accesses due to the representation of sparsity. To mitigate the memory access overheads, recent accelerator designs advocated the outer product processing which minimizes input accesses but generates intermediate products to be merged to the final output matrix. Using real-world sparse matrices, this study first identifies the memory bloating problem of the outer product designs due to the unpredictable intermediate products. Such an unpredictable increase in memory requirement during computation can limit the applicability of accelerators. To address the memory bloating problem, this study revisits an alternative inner product approach, and proposes a new accelerator design called InnerSP. This study shows that nonzero element distributions in real-world sparse matrices have a certain level of locality. Using a smart caching scheme designed for inner product, the locality is effectively exploited with a modest on-chip cache. However, the row-wise inner product relies on on-chip aggregation of intermediate products. Due to uneven sparsity per row, overflows or underflows of the on-chip storage for aggregation can occur. To maximize the parallelism while avoiding costly overflows, the proposed accelerator uses pre-scanning for row splitting and merging. The simulation results show that the performance of InnerSP can exceed or be similar to those of the prior outer product approaches without any memory bloating problem. © 2021 IEEE	-
dc.language	English	-
dc.publisher	IEEE Computer Society	-
dc.title	InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product Processing	-
dc.type	Conference Paper	-
dc.identifier.doi	10.1109/pact52795.2021.00016	-
dc.identifier.scopusid	2-s2.0-85124190809	-
dc.identifier.bibliographicCitation	International Conference on Parallel Architectures and Compilation Techniques, pp.116 - 128	-
dc.identifier.url	http://pact21.snu.ac.kr/index.php/program-kst/	-
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	Atlanta	-
dc.citation.endPage	128	-
dc.citation.startPage	116	-
dc.citation.title	International Conference on Parallel Architectures and Compilation Techniques	-

Files in This Item:: There are no files associated with this item.

Appears in Collections:: Department of Electrical Engineering and Computer Science Computer Architecture and Systems Lab 2. Conference Papers

Show Simple Item Record

qrcode

DGIST

DGIST Scholar was built with support from the OAK distribution project by the National Library of Korea.

You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Library Services Team, DGIST 333. Techno Jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 42988, Republic of Korea.

DGIST Library Repository

BROWSE

DGIST

BROWSE