WEB OF SCIENCE
SCOPUS
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jeon, Yunhyeong | - |
| dc.contributor.author | Jang, Minwoo | - |
| dc.contributor.author | Lee, Hwanjun | - |
| dc.contributor.author | Jung, Yeji | - |
| dc.contributor.author | Jung, Jin | - |
| dc.contributor.author | Lee, Jonggeon | - |
| dc.contributor.author | So, Jinin | - |
| dc.contributor.author | Kim, Daehoon | - |
| dc.date.accessioned | 2025-03-06T17:10:18Z | - |
| dc.date.available | 2025-03-06T17:10:18Z | - |
| dc.date.created | 2025-02-14 | - |
| dc.date.issued | 2025-01 | - |
| dc.identifier.issn | 1556-6056 | - |
| dc.identifier.uri | http://hdl.handle.net/20.500.11750/58122 | - |
| dc.description.abstract | The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9× performance improvement and 914.1× energy savings compared to conventional systems. © IEEE. | - |
| dc.language | English | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.title | RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/LCA.2025.3535470 | - |
| dc.identifier.wosid | 001428031000001 | - |
| dc.identifier.scopusid | 2-s2.0-85216973093 | - |
| dc.identifier.bibliographicCitation | Jeon, Yunhyeong. (2025-01). RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models. IEEE Computer Architecture Letters, 24(1), 41–44. doi: 10.1109/LCA.2025.3535470 | - |
| dc.description.isOpenAccess | FALSE | - |
| dc.subject.keywordAuthor | Processing-in-memory | - |
| dc.subject.keywordAuthor | transformer model | - |
| dc.subject.keywordAuthor | rotary positional embedding | - |
| dc.citation.endPage | 44 | - |
| dc.citation.number | 1 | - |
| dc.citation.startPage | 41 | - |
| dc.citation.title | IEEE Computer Architecture Letters | - |
| dc.citation.volume | 24 | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Hardware & Architecture | - |
| dc.type.docType | Article | - |