WEB OF SCIENCE
SCOPUS
From simple image classifiers to complex and large language models, generalized matrix multiplication (GEMM) is the fundamental and the most time-consuming operation among all mathematical operations involved in them. To accelerate the computation of matrix multiplication in deep learning, many off-the-shelf neural processors utilize systolic arrays as dedicated hardware for the GEMM operations. Recently, more generalized form of the systolic array, i.e., a systolic tensor array (STA) which includes vectorized MAC units within a single processing unit, has been proposed. However, the optimal selection of STA configuration on a given deep learning model is difficult due to large configuration search space. To help select the optimal STA configuration in many deep learning models, in this work, we present a ready-to-use and open-source RTL generator for various STA configurations. The power consumption and post-layout area of several STAs are analyzed by using open-source EDA tools. © 2024 IEEE.
더보기