How to get single contig high-quality microbial genomes

Microbial genomics is an essential instrument for understanding microorganisms. Whole genome sequencing delivers a comprehensive view of the entire genome and is often a good starting point when working with a new strain or new type of microorganism. BaseClear offers different sequencing technologies in order to produce high-quality microbial genomes, and our aim is to deliver completely closed, single contig genomes. With the addition of the latest platform from Pacific Biosciences, the Sequel system, we can now offer a new and improved approach for your microbial genome sequencing project.

LONG READ SEQUENCING ON THE PACBIO PLATFORMS

The Sequel system is the latest sequencing platform developed by Pacific Biosciences. Just like the RS-II, the new Sequel system is based on the proven SMRT sequencing technology. However, the Sequel system has a 5-7 times higher output compared to the RS-II thanks to the development of larger SMRT cells (the disposable units in which the sequencing takes place). The RS-II generally produces 0,5-1 Gb data per SMRT cell, while the first runs with the latest Sequel chemistry generated between 3,3 and 6 Gb per SMRT cell. PacBio specifications claim that up to 500 k polymerase reads can be obtained, and our first runs reached 250-490 k polymerase reads (compared to 75-90 k for the RS-II). PacBio is working hard on further improvements of the Sequel reagents, which are expected to improve the output further, in particular the read length.

In fact, a small disadvantage of the Sequel vs. RS-II at this moment is that the read length is slightly shorter on the Sequel. This is related to the stability of the polymerase and even when we apply a size-selection on the DNA, the majority of the reads will have a read length of <15 kb. The most recent chemistry showed an improvement in this regards and we reached median polymerase read lengths of 13-15 kb. In the figure below a typical overview of the read-length distribution is given after a 20 kb size-selection.

BEST IN PRACTISE STRATEGY

Our aim is to deliver completely closed, single contig microbial genomes. In our experience, the results of specific samples may vary due to genome complexity, GC content and the presence of complex repetitive elements in the genome. Our best-in-practice strategy for complete genome sequencing in the last years was based on a hybrid approach, in which we combined both PacBio and Illumina data with our in-house developed bioinformatics pipeline and our SSPACE software. Now, thanks to the increase in output of the PacBio Sequel and the development of new improved assembly algorithms, we developed a PacBio-only de novo sequencing service including with our in-house optimized bioinformatics pipeline (including the HINGE or HGAP assemblers). Our internal tests showed that this new approach generates assemblies with an even better architecture as compared to the hybrid approach. In addition, we developed an assembly polishing pipeline that makes use of high-quality short-read Illumina data to improve the assembly sequence at the single base level.

ALTERNATIVE LONG READ TECHNOLOGY: NANOPORE SEQUENCING

In the context of the rapid developments of Next-Generation sequencing technologies, a frequent question is for how long PacBio will be the preferred platform for obtaining complete single contig microbial genomes. In particular, the Oxford Nanopore technology platforms (MinION, PromethION and the GridION) have the potential to compete with PacBio and possibly become the preferred platform for these applications. As discussed in other blogs, we at BaseClear have invested quite some efforts and have worked with the Oxford Nanopore technology since 2015 with very positive results.

When you want to discuss your microbial genome sequencing project, please contact us via our contact form.

 

Get in touch