Detail View

Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window
Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

DC Field Value Language
dc.contributor.author Bodenham, Matthew -
dc.contributor.author Kung, Jaeha -
dc.date.accessioned 2024-12-31T18:10:21Z -
dc.date.available 2024-12-31T18:10:21Z -
dc.date.created 2024-09-20 -
dc.date.issued 2024-09 -
dc.identifier.issn 2169-3536 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/57487 -
dc.description.abstract Deployment of language models to resource-constrained edge devices is an uphill battle against their ever-increasing size. The task transferability of language models makes deployment to the edge an attractive application. Prior neural architecture search (NAS) works have produced hardware-efficient transformers, but often overlook some architectural features in favor of efficient NAS. We propose a novel evolutionary NAS with large and flexible search space to encourage the exploration of previously unexplored transformer architectures. Our search space allows architectures to vary through their depth and skip connections to transfer information anywhere inside the architecture; Skipformer, the top searched model, displays these novel architectural features. To further increase Skipformer efficiency, we learn a CUDA-accelerated attention window size at each self-attention layer during training. Skipformer achieves 23.3% speed up and requires 19.2% less memory on NVIDIA Jetson Nano with negligible accuracy loss on GLEU benchmark compared to GPT-2 Small. -
dc.language English -
dc.publisher Institute of Electrical and Electronics Engineers Inc. -
dc.title Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window -
dc.type Article -
dc.identifier.doi 10.1109/ACCESS.2024.3420232 -
dc.identifier.wosid 001311199700001 -
dc.identifier.scopusid 2-s2.0-85203505008 -
dc.identifier.bibliographicCitation Bodenham, Matthew. (2024-09). Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window. IEEE Access, 12, 124428–124439. doi: 10.1109/ACCESS.2024.3420232 -
dc.description.isOpenAccess TRUE -
dc.subject.keywordAuthor Computational modeling -
dc.subject.keywordAuthor Transformers -
dc.subject.keywordAuthor Natural language processing -
dc.subject.keywordAuthor Computer architecture -
dc.subject.keywordAuthor Context modeling -
dc.subject.keywordAuthor Training -
dc.subject.keywordAuthor Language models -
dc.subject.keywordAuthor neural architecture search -
dc.subject.keywordAuthor on-device inference -
dc.subject.keywordAuthor transformers -
dc.citation.endPage 124439 -
dc.citation.startPage 124428 -
dc.citation.title IEEE Access -
dc.citation.volume 12 -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.relation.journalResearchArea Computer Science; Engineering; Telecommunications -
dc.relation.journalWebOfScienceCategory Computer Science, Information Systems; Engineering, Electrical & Electronic; Telecommunications -
dc.type.docType Article -
Show Simple Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Total Views & Downloads