Repository Collection: null

Repository Collection: null https://scholar.dgist.ac.kr/handle/20.500.11750/12958 2026-05-15T04:22:44Z 2026-05-15T04:22:44Z CAM-CIM: A Hybrid Compute-in-Memory Using Content-Addressable Memory with Subword Split Mapping for Reduced ADC Resolution Jung, Sangwoo Lee, Hojin Lee, Yejin Park, Jiyong Park, Dahoon Shin, Hyunseob Yoon, Jong-Hyeok Kung, Jaeha https://scholar.dgist.ac.kr/handle/20.500.11750/60122 2026-02-25T08:40:11Z 2025-08-07T15:00:00Z

Title: CAM-CIM: A Hybrid Compute-in-Memory Using Content-Addressable Memory with Subword Split Mapping for Reduced ADC Resolution Author(s): Jung, Sangwoo; Lee, Hojin; Lee, Yejin; Park, Jiyong; Park, Dahoon; Shin, Hyunseob; Yoon, Jong-Hyeok; Kung, Jaeha Abstract: Recently, compute-in-memory (CIM) has become a promising architecture for data-intensive applications such as deep learning. However, analog or digital CIM (ACIM or DCIM) faces some design challenges. ACIMs inherently have non-idealities, which lead to significant accuracy degradation. In addition, a substantial amount of power is consumed by analog-to-digital converters (ADC). On the other hand, DCIMs show an exponential increase in power consumption and computing cycles as the operand bit-width increases, particularly due to an accumulation stage. In this paper, to overcome these challenges, we propose a hybrid DCIM-ACIM architecture that consists of a content addressable memory (CAM) as DCIM and a cluster-based multi-cycle ACIM, called CAM-CIM. As a weight mapping strategy, we present a subword split mapping that assigns some MSBs to DCIM for improved accuracy and the remaining LSBs to ACIM for reduced ADC resolution. The accuracy of using the proposed CAM-CIM array is evaluated on various deep learning benchmarks from CNNs to Swin-Tiny. A 65nm CAM-CIM macro with either 3-bit or 4-bit ADCs shows 10.3x and 5.4x improvement in energy efficiency, on average, compared to CAM- and CIM-only architectures, respectively. Compared to recent CIM architectures, CAM-CIM demonstrates 1.4x higher energy efficiency.

2025-08-07T15:00:00Z A 65nm 687.5-TOPS/W Drive Strength-based SRAM Compute-In-Memory Macro with Adaptive Dynamic Range for Edge AI applications Choi, Dong-Gu Lee, Jaehyun Koo, Jahyun Han, Woo Kyoung Park, Dahoon Kung, Jaeha Lee, Junghyup Yoon, Jong-Hyeok https://scholar.dgist.ac.kr/handle/20.500.11750/57946 2025-07-25T04:25:40Z 2024-11-18T15:00:00Z

Title: A 65nm 687.5-TOPS/W Drive Strength-based SRAM Compute-In-Memory Macro with Adaptive Dynamic Range for Edge AI applications Author(s): Choi, Dong-Gu; Lee, Jaehyun; Koo, Jahyun; Han, Woo Kyoung; Park, Dahoon; Kung, Jaeha; Lee, Junghyup; Yoon, Jong-Hyeok Abstract: Analog compute-in-memory (ACIM) has been intensively investigated, pursuing better energy efficiency, network accuracy, and compatibility with various AI models [1-5]. In particular, SRAM-based ACIM macros achieve the flexibility of input/weight (IN/W) allocation incorporating bit-serial inputs, bitwise weight loading across multiple bitlines (BL), and digital shift-and-add multiplication of partial sum (PSUM) in the output line (OL). However, shift-and-add multiplication inevitably exacerbates the PSUM errors arising from the computing/readout process under device mismatches and a limited sensing margin (SM) in ACIM (Fig. 1). This leads to severely erroneous MAC outputs and substantial accuracy loss, impeding the practical utilization of ACIM. To mitigate the Psum errors, the ACIM macro with high-precision IN/W and truncation at the MAC output was proposed [4]. The truncation filters out the quantization noise to an extent, thereby attaining the mitigated accuracy loss. Nevertheless, prior work still suffers from PSUM errors due to limited VLSB of high-resolution ADCs. Furthermore, the truncated MAC outputs undermine the advantages of high-precision IN/W undergoing frequent weight updates in ACIM macros. An alternative approach is using a low-resolution ADC with quantization for PSUM to secure higher VLSB and suppress the resultant PSUM error [5]. However, under high macro utilization, it eventually suffers from accuracy loss due to quantization error, which is amplified by the shift and adder. To address the challenges, the drive strength-based SRAM compute-in-memory (DS-CIM) macro is proposed featuring: 1) 6b drive strength-mode sensing with adaptive dynamic range that secures up to 39.2x-boosted sensing margin and 97% of error-free Psum readout on 2's-complement 4b-IN/W ResNet-20 benchmarks, 2) row-wise adaptive dynamic range SAR (ADR-SAR) logic enabling concurrent ADC readout at every OL with the area efficiency of 15.83 TOPS/mm2, 3) input-aware binary search (IABS) reducing average ADC conversion cycles by 64% on the ResNet-20 benchmark, and 4) a heterogeneous logic unit (HLU) for column-wise logic reconfigurability. © 2024 IEEE.

2024-11-18T15:00:00Z A 97dB-PSRR 178.4dB-FOMDR Calibration-Free VCO−ΔΣ ADC Using a PVT-Insensitive Frequency-Locked Differential Regulation Scheme for Multi-Channel ExG Acquisition Lee, Sehwan Seol, Taeryoung Kim, Geunha Song, Minyoung Kim, Gain Yoon, Jong-Hyeok George, Arup K. Lee, Junghyup https://scholar.dgist.ac.kr/handle/20.500.11750/57859 2025-07-25T03:29:30Z 2024-06-17T15:00:00Z

Title: A 97dB-PSRR 178.4dB-FOMDR Calibration-Free VCO−ΔΣ ADC Using a PVT-Insensitive Frequency-Locked Differential Regulation Scheme for Multi-Channel ExG Acquisition Author(s): Lee, Sehwan; Seol, Taeryoung; Kim, Geunha; Song, Minyoung; Kim, Gain; Yoon, Jong-Hyeok; George, Arup K.; Lee, Junghyup Abstract: This paper proposes a 97dB-PSRR, 178.4dB-FOMDR calibration-free 16-channel VCO-ΔΣ ADC system using a PVT-insensitive frequency-locked differential regulation (FLDR) scheme suitable for wireless ExG Acquisition. Thanks to the FLDR, the SNDR degradation in all 16 channels is less than 1dB over 1.4-2V supply and 20-60°C temperature ranges. Implemented in a 0.18μm standard CMOS process, the proposed system consumes 172μW from a 1.4V supply and occupies 2.7mm2 active area, while a single channel consumes 4.2μW and 0.12mm2, respectively. © 2024 IEEE.

2024-06-17T15:00:00Z 30.1 A 40nm VLIW Edge Accelerator with 5MB of 0.256pJ/b RRAM and a Localization Solver for Bristle Robot Surveillance Spetalnick, Samuel D. Lele, Ashwin Sanjay Crafton, Brian Chang, Muya Ryu, Sigang Yoon, Jong-Hyeok Hao, Zhijian Ansari, Azadeh Khwa, Win-San Chih, Yu-Der Chang, Meng-Fan Raychowdhury, Arijit https://scholar.dgist.ac.kr/handle/20.500.11750/57830 2025-07-25T03:31:38Z 2024-02-20T15:00:00Z

Title: 30.1 A 40nm VLIW Edge Accelerator with 5MB of 0.256pJ/b RRAM and a Localization Solver for Bristle Robot Surveillance Author(s): Spetalnick, Samuel D.; Lele, Ashwin Sanjay; Crafton, Brian; Chang, Muya; Ryu, Sigang; Yoon, Jong-Hyeok; Hao, Zhijian; Ansari, Azadeh; Khwa, Win-San; Chih, Yu-Der; Chang, Meng-Fan; Raychowdhury, Arijit Abstract: Tiny surveillance robots need to efficiently compute a perception front-end workload, consisting of a neural network inference stack, and a localization back-end workload implementing a set of state-space equations. Miniaturization and low-power actuation make bristle robots [1] attractive locomotion platforms, but size limits lead to stringent energy constraints. The edge accelerator needs low leakage for long retentive stretches and efficient matrix compute for active bursts. We present a 0.84TOPS/W, 110μW retentive-sleep-capable resistive random-access memory (RRAM)-based accelerator in 40nm with 10 very long instruction word (VLIW)-controlled nonvolatile memory (NVM) matrix units (NMUs) with, in total, 5MB of RRAM, combined with a 10T SRAM-based state-update accelerator enabled by in-place memory updates. At VMIN, the design improves NVM access energy to 0.256pJ/b and peak NVM bandwidth to 12.8GB/s. © 2024 IEEE.

2024-02-20T15:00:00Z