Detail View

Title
Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models
Issued Date
2025-04-02
Citation
Lee, Junyoung. (2025-04-02). Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models. Design Automation and Test in Europe Conference, 1–2. doi: 10.23919/DATE64628.2025.10992978
Type
Conference Paper
ISBN
9783982674100
ISSN
1558-1101
Abstract
We propose Matryoshka, a novel framework for transformer model pruning, enabling dynamic runtime controls while maintaining competitive accuracy to modern large language models (LLMs). Matryoshka incrementally constructs submodels with varying complexities, allowing runtime adaptation without maintaining separate models. Our evaluations on LLaMA-7B demonstrate that Matryoshka achieves up to 34% speedup and outperforms the quality of state-of-the-art pruning methods, providing a flexible solution for deploying LLMs. © 2025 EDAA.
URI
https://scholar.dgist.ac.kr/handle/20.500.11750/58487
DOI
10.23919/DATE64628.2025.10992978
Publisher
Institute of Electrical and Electronics Engineers Inc.
Show Full Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Related Researcher

좌훈승
Chwa, Hoonsung좌훈승

Department of Electrical Engineering and Computer Science

read more

Total Views & Downloads