Primer design is a fundamental technique that is widely used for polymerase chain reaction (PCR). Although many methods have been proposed for primer design, they re-quire a great deal of manual effort to generate feasible and valid primers, including homol-ogy tests on off-target sequences using BLAST-like tools. That approach is inconvenient for many target sequences of quantitative PCR (qPCR) due to considering the same strin-gent and allele-invariant constraints. In this dissertation, we propose an entirely new meth-od that overcomes these drawbacks. In the first part of this dissertation, we propose the method called MRPrimer that can design all feasible and valid primer pairs existing in a DNA database at once, while simultaneously checking a multitude of filtering constraints and validating primer specifici-ty. Furthermore, MRPrimer suggests the best primer pair for each target sequence, based on a ranking method. Through qPCR analysis using 343 primer pairs and the corresponding sequencing and comparative analyses, we showed that the primer pairs designed by MRPrimer are very stable and effective for qPCR. In addition, MRPrimer is computation-ally efficient and scalable, and therefore useful for quickly constructing an entire collection of feasible and valid primers for frequently updated databases like RefSeq. Furthermore, we suggest that MRPrimer can be utilized conveniently for experiments requiring primer design, especially real-time qPCR. Existing web servers for primer design have major drawbacks, including requiring the use of BLAST-like tools for homology tests, lack of support for ranking of primers, TaqMan probes, and simultaneous design of primers against multiple targets. Due to the large-scale computational overhead, the few web servers supporting homology tests use heuristic approaches or perform homology tests within a limited scope. The primer pairs designed by MRPrimer are very stable and effective in qPCR experiments. However, alt-hough MRPrimer can design very high-quality primers, routine use is inconvenient because it runs on a cluster of computers and requires several hours of runtime when the filtering constraints are adjusted. In the second part of this dissertation, we propose MRPrimerW, the online version of MRPrimer, allows users to design the best primers quickly in a web interface, without requiring a MapReduce cluster or a long computation, as in Google’s search system. It per-forms complete homology testing, supports batch design of primers for multi-target qPCR experiments, supports design of TaqMan probes, and ranks the resulting primers to return the top-1 best primers to the user. To ensure high accuracy, we adopted the core algorithm of MRPrimer, but completely redesigned it to allow users to receive query results quickly in a web interface, without requiring a MapReduce cluster or a long computation. MRPrimerW provides primer design services and a complete set of 341,963,135 in-silico validated primers covering 99% of human and mouse genes. In summary, we have proposed a new method for primer design that overcomes most of drawbacks of existing methods. For an entire DNA database, we have proposed MRPrimer that can design all possible feasible and valid primer pairs through simultane-ously checking a multitude of filtering constraints and validating primer specificity. For user query from web interface, we have proposed MRPrimerW that performs complete homology tests, supports batch designing for qPCR, supports TaqMan probe design, and supports ranking of primers. We believe that the proposed methods will be contribute to increasing the efficiency and specificity of experiments involving PCR. ⓒ 2016 DGIST
Table Of Contents
Ⅰ. INTRODUCTION 1 -- 1.1 Background 1 -- 1.2 Motivation and Objectives 7 -- 1.3 Structure of thesis 10 -- Ⅱ. REALTED WORK 12 -- 2.1 Batch-style primer design method 12 -- 2.2 Web-based primer design method 14 -- Ⅲ. MRPRIMER: Batch-style primer design method 17 -- 3.1 Overview 17 -- 3.2 MRPrimer algorithm 21 -- 3.2.1 Step1:.Candidate primer generation round 21 -- 3.2.2 Step2: Single filtering round 23 -- 3.2.3 Step3: 5’ cross-hybridization filtering round 25 -- 3.2.4 Step4:.General cross-hybridization filtering round 27 -- 3.2.5 Step5: Duplicate removing round 32 -- 3.2.6 Step6: Pair filtering round 33 -- 3.2.7 Step7. Ranking round 36 -- 3.3 Experiments for biological validation 38 -- 3.3.1 Data and methods 38 -- 3.3.2 qPCR analysis 42 -- 3.3.3 Comparative analysis 44 -- 3.4 Experiments for computational performance 46 -- 3.4.1Data and setup 46 -- 3.4.2 Results of the completeness and effective ranking system 46 -- 3.4.3 Results of the coverage and specificity 49 -- 3.4.4 Results of the computational efficiency and scalability 52 -- Ⅳ. MRPRIMERW: Web-based primer design method -- 4.1 Overview 55 -- 4.2 Offline processing part 57 -- 4.3 Index building part 62 -- 4.4 Online processing part 66 -- 4.5 Web interface 70 -- Ⅴ. CONCLUSIONS 76