布朗大学以及法国国立计算机与自动化研究所博士后招聘信息－国外博士后招聘/博士后招收

Despite the advent of a number of new methods during the last few years, the assembly of full-length eukaryotic genomes remains a challenging problem. This problem is known to be difficult because of its intrinsic computational complexity and of the nature of genomic sequences, that may contain a high fraction of repetitive elements, low complexity regions, rearrangements, large insertions or deletions, etc. The advent of high-throughput sequencing technologies made the challenge even harder. The sequence reads are much shorter and much more numerous. For example, a vertebrate genome puzzle contains several billion pieces with multiple overlapping copies of each piece. Yet de novo assembly is still mainly unsolved for large eukaryotic genomes. Generally, two approaches are considered. The first one is used mainly in resequencing projects. It is based on read mapping and assumes the availability of a reference sequence coming from the same species against which the reads can be aligned. It thus allows identification of small local variations (substitutions or indels). The second one, called de novo sequence assembly, does not make use of any prior assembly. Both can take advantage of paired-end reads to detect structural variations such as large insertions or rearrangements. Some works also explore a new path for the assembly problem, inspired both from de novo sequencing problem [1,2] and methods that have been developed for resequencing [3,4,5]. The key idea is that even when a reference sequence is not available, there are now representative genome sequences of most of the major phylogenetic clades. So a set of closely related sequences can be used to guide the assembly process. In [6] some of these approaches have been explored for Sanger conventional sequencing. More recently, in [7], a non automated process has been used on four genomes at the intraspecific level. In [8, 9], the authors took into account both assembly and mapping but considering only one reference genome. In [10, 11], this approach has been used for scaffolding. In [12] the knowledge of predicted breakpoints from a set of variants has been used to improve assembly. We thus propose the exciting project of designing a tool which aims to remove these barriers, using a higher sensitive read mapping process with a multiple reference set, combined with an assembly approach taking into account paired-end information and structural variation through a guided phylogenetic approach.

Description

Benefits