BaseClear’s variant detection pipeline renewed
Variant detection is an important application of next-generation sequencing experiments. It involves the alignment of sequence reads to a reference and the detection of mutations …Read more
Next-Generation Sequencing (NGS) gives many laboratories access to high-throughput bacterial genomic research. The number of bacterial genomes sequenced is vertiginously increasing. Yet the fraction of completely closed, single contig assemblies is extremely small compared to the number of generated draft genomes. The explanation is rather easy, the costs to generate a draft genome using short read (e.g. Illumina sequencing) technologies is manifold less than the costs associated with long read third generation sequencing technologies. And costs should be seen in a broad sense: from the costs of chemicals and laboratory personnel, to the expenses of the sequencing platform and the data analysis. But change is coming! The latest long read sequencing technologies, in particular the Nanopore technology offered by the platforms of Oxford Nanopore Technologies, have the potential to further ease the process and breakdown the analysis costs.
The PacBio platforms are available for several years now and uses SMRT technology (Single Molecule Real Time sequencing) to generate long sequence reads of up to 30 kbp. It offers an advantage over other technologies in that it can sequence individual DNA molecules by recording the incorporation of fluorescent labelled nucleotides in real-time. The sequence bias of the PacBio platform is relatively high (approximately 10%), but in order to improve sequence quality the PacBio system can ligate adapters at both ends of each DNA molecule to form a continuous DNA circle. This allows each DNA molecule to be sequenced through multiple passes to build a more accurate consensus read. The technology not matches second generation platforms like Illumina in terms of sequence quality, but has proven useful in various applications including contig scaffolding.
In the past years we have put much effort in creating workflows which involves a hybrid analysis approach where both Illumina short reads and PacBio long reads are combined to create closed genomes of extremely high quality. Not only lab protocols are optimized, also a full data analysis pipeline, called SSPACE-LongRead (Boetzer and Pirovano, 2014) has been developed: whereas the Illumina technology typically generates genome assemblies comprising 20–500 contigs, mainly due to the presence of repetitive elements which are larger than the short read length. The addition of PacBio long reads in many cases leads to a fully closed bacterial genome of one contig per chromosome and possible plasmids.
Nanopore technology promises to have an enormous strong impact on genomic research as the library preparation and sequencing steps will gradually become only a fraction of the costs of current long read sequencing technologies. Oxford Nanopore Technologies started in 2014 with an early access program of their first robust sequencer (The MinION Access Program, MAP). Likewise the PacBio system, it delivers long read real-time sequencing of individual molecules. It is the first commercial nanopore-based (rather than synthesis-based) sequencer. The MinION is quite small and portable, having the size of a USB stick. The current throughput is still limited but quickly increasing to several Giga-bases per run. In 2015 the PromethION was offered in an early access program, which is a small benchtop system for high throughput real-time biological analyses and allowing larger sample numbers. This will give a further boost to the technology and make it suited for sequencing 96-wells plate in no-time!
Meanwhile different solutions are being developed for easy and on-site library preparations, can it be a matter of time before this becomes routine work? And will we be able to easily close bacterial genomes in a high-throughput manner when read lengths go beyond 100.000 base pairs though with an error rate of approximately 10%? It seems that the revolution has started and no wonder the participants of the London Calling meeting, where different partners of the MinION advanced access program presented their findings, returned home with a huge smile on their face. But there are still some hurdles to take, also at the bioinformatics side…
So what about the development of appropriate analysis tools for long read sequencing? BaseClear bioinformaticians are at the forefront of data analysis and in particular the published genome finishing tools SSPACE Standard and GapFiller are largely recognized by the field which is underscored by a huge number of citations. The latest assembly plugin developed internally by the BaseClear Bioinformatics team, SSPACE-LongRead (Boetzer and Pirovano, 2014, http://www.biomedcentral.com/1471-2105/15/211), is well capable of merging the short-read Illumina data together with long-read PacBio or Nanopore data into one hybrid assembly to completely close genomes. Currently also this latter tool is becoming more and more recognized by the community. As such recently SSPACE-LongRead is used by a group of researchers who evaluated the utility of the MinION sequencing system of Oxford Nanopore for scaffolding bacterial genomes by comparing MinION sequencing data for two Francisella genomes: the findings of Kar lsson and co-workers were published in Nature Scientific Reports (http://www.nature.com/srep/2015/150707/srep11996/full/srep11996.html). SSPACE-LongRead was used together with BLASR (Chaisson and Tesler, 2012) to scaffold the contigs using the MinION reads. The MinION reads could correctly join all of the contigs to a complete chromosome.
Although first successful results using Oxford Nanopore technology are now published, making reliable assemblies of bacterial (or other) genomes remains challenging. The right preparation, coverage, software, parameters, etc. need to selected in order to obtain best quality de novo assembly results. For this the right bioinformatics expertise and experience is needed as there are many pitfalls. Nanopore technology will be a good alternative for PacBio sequencing and in time might omit the use of Illumina sequencing for genome de novo sequencing projects. But it will take some more time before bacterial genome de novo assemblies will be a real piece of cake and can be applied by any researcher in any laboratory. Meanwhile you can be sure that BaseClear bioinformatics are taking the challenge to create analysis pipelines and assembly tools that are at the forefront as new sequencing technologies enter the scene!
Walter Pirovano – Director bioinformatics BaseClear B.V.