Detail View
Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models
Citations
WEB OF SCIENCE
Citations
SCOPUS
- Title
- Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models
- Issued Date
- 2025-04-02
- Citation
- Lee, Junyoung. (2025-04-02). Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models. Design Automation and Test in Europe Conference, 1–2. doi: 10.23919/DATE64628.2025.10992978
- Type
- Conference Paper
- ISBN
- 9783982674100
- ISSN
- 1558-1101
- Abstract
-
We propose Matryoshka, a novel framework for transformer model pruning, enabling dynamic runtime controls while maintaining competitive accuracy to modern large language models (LLMs). Matryoshka incrementally constructs submodels with varying complexities, allowing runtime adaptation without maintaining separate models. Our evaluations on LLaMA-7B demonstrate that Matryoshka achieves up to 34% speedup and outperforms the quality of state-of-the-art pruning methods, providing a flexible solution for deploying LLMs. © 2025 EDAA.
더보기
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
File Downloads
- There are no files associated with this item.
공유
Related Researcher
- Chwa, Hoonsung좌훈승
-
Department of Electrical Engineering and Computer Science
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
