Detail View

RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models
Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

DC Field Value Language
dc.contributor.author Jeon, Yunhyeong -
dc.contributor.author Jang, Minwoo -
dc.contributor.author Lee, Hwanjun -
dc.contributor.author Jung, Yeji -
dc.contributor.author Jung, Jin -
dc.contributor.author Lee, Jonggeon -
dc.contributor.author So, Jinin -
dc.contributor.author Kim, Daehoon -
dc.date.accessioned 2025-03-06T17:10:18Z -
dc.date.available 2025-03-06T17:10:18Z -
dc.date.created 2025-02-14 -
dc.date.issued 2025-01 -
dc.identifier.issn 1556-6056 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/58122 -
dc.description.abstract The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9× performance improvement and 914.1× energy savings compared to conventional systems. © IEEE. -
dc.language English -
dc.publisher Institute of Electrical and Electronics Engineers -
dc.title RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models -
dc.type Article -
dc.identifier.doi 10.1109/LCA.2025.3535470 -
dc.identifier.wosid 001428031000001 -
dc.identifier.scopusid 2-s2.0-85216973093 -
dc.identifier.bibliographicCitation Jeon, Yunhyeong. (2025-01). RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models. IEEE Computer Architecture Letters, 24(1), 41–44. doi: 10.1109/LCA.2025.3535470 -
dc.description.isOpenAccess FALSE -
dc.subject.keywordAuthor Processing-in-memory -
dc.subject.keywordAuthor transformer model -
dc.subject.keywordAuthor rotary positional embedding -
dc.citation.endPage 44 -
dc.citation.number 1 -
dc.citation.startPage 41 -
dc.citation.title IEEE Computer Architecture Letters -
dc.citation.volume 24 -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.relation.journalResearchArea Computer Science -
dc.relation.journalWebOfScienceCategory Computer Science, Hardware & Architecture -
dc.type.docType Article -
Show Simple Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Total Views & Downloads