DGIST Scholar: Improving the Performance of Natural Language Deep Learning Models by Using Dimension Attribute Values

Department of Electrical Engineering and Computer Science Theses Master

Cited time in webofscience

Cited time in scopus

Improving the Performance of Natural Language Deep Learning Models by Using Dimension Attribute Values

Title: Improving the Performance of Natural Language Deep Learning Models by Using Dimension Attribute Values

Table Of Contents: 1. Introduction 1
1.1 Motivation 2
1.2 Contributions 3
2. Background 5
2.1 Language Modeling 5
2.2 Context-independent and Context-sensitive Text Representation 6
2.3 Pre-train and Fine-tune Paradigm in the Field of Natural Language Processing 8
2.4 Semantic Role Labeling 9
2.5 Citation Intent Classification 10
3. Related Work 11
3.1 BERT (Bidirectional Transformers for Language Understanding) 11
3.2 BERT Variant Models 13
3.2.1 BERT Models Pre-trained with Domain-specific Corpus 13
3.2.2 BERT Models Pre-trained with New Tasks 14
3.2.3 BERT Models Pre-trained with Additional Features 15
4. Our Methods for OLAP-BERT 17
4.1 Method 1: Additional Features Affect Text Tokens Differently 17
4.2 Method 2: Additional Features Affect Text Tokens Equally 20
5. Our Datasets for OLAP-BERT 22
5.1 DBLP-RC: A Record-based Corpus 22
5.2 Record-based Labeled Datasets 24
5.2.1 DBLP-RDfSRL: A Record-based Dataset for Semantic Role Labeling 24
5.2.2 DBLP-RDfCIC: A Record-based Dataset for Citation Intent Classification 25
6. Experiments 26
6.1 Datasets 26
6.1.1 Record-based Corpus for Pre-training 26
6.1.2 Record-based Datasets for Fine-training 28
6.2 Experimental Setup 28
6.2.1 Pre-training for Natural Language Understanding Models 29
6.2.2 Fine-tuning for Task-specific Models 31
6.3 Experimental Results 32
6.3.1 Results of the Pre-training 32
6.3.2 Results of the Fine-tuning 34
7. Discussion 36
8. Conclusions 38
9. References 40