LSM-tree based key-value stores show high performance in write-intensive workloads due to the out-of-place update structure. However, this structure requires additional storage space to store a lot of redundant data in database and results in high space amplification. In order to remove redundant data, most modern LSM-tree stores use a compaction triggered by the capacity in each level. But the size-based compaction trigger often occurs inadequate compaction frequency according to the workload, causing high write amplification and reduces overall performance. To address this, we introduce delete-aware compaction trigger which responds to the current workload’s deletion rates and reduces ineffi-cient compactions consisting of most valid records. We implemented Delete-Aware RocksDB on top of RocksDB, and we show that it outperforms by keeping low space amplification without write amplifica-tion cost compared to RocksDB. Furthermore, we show our system cooperated with Monkey, one of the state-of-the-art LSM-tree based key-value store, outperforms existing systems in terms of throughput.
Table Of Contents
Ⅰ. Introduction 1 1.1 Motivation 2 1.1.1 Lifecycle of a deletion record and inefficiency in a compaction 2 1.1.2 Maintaining low space amplification decreases overall performance 3 1.2 Research Goals and Thesis Structure 5 1.2.1 Goal 1: To make a compaction efficient according to the workload 5 1.2.2 Goal 2: To trigger a compaction at the proper time 5 1.2.3 Thesis Structure 5 ⅠⅠ. Background 6 2.1 LSM-Tree Basics 6 2.1.1 Memtable and SSTable 6 2.1.2 Compaction 8 2.1.3 Bloom Filter 9 2.2 Amplification Factors 10 2.2.1 Read Amplification 10 2.2.2 Write Amplification 10 2.2.3 Space Amplification 11 ⅠⅠⅠ. Implementation 12 3.1 Implementing Delete-Aware RocksDB 13 3.1.1 A delete-aware compaction trigger 13 3.1.2 Pre-remove strategy for redundant tombstones 16 3.1.3 Dynamic runtime adaptation 17 3.2 Extending db_bench 20 3.2.1 Flags 20 3.2.2 Configuration of Delete-Aware RocksDB 20 3.3 Cooperation with Monkey Method 21 ⅠV. Benchmarks 23 4.1 Experimental Setup 23 4.2 Throughput 24 4.3 The Number of Compactions and Space Amplification 25 V. Discussion 27 VI. Conclusion 28 References 29