Variant detection is an important application of next-generation sequencing experiments. It involves the alignment of sequence reads to a reference and the detection of mutations and other minor variations. While this process may sound simple, in practice many steps and parameters are involved to make sure the variants are correctly predicted. In the past years, BaseClear has used a sophisticated variant detection pipeline which was primarily built on algorithms included in the CLC Genomics Workbench (CGW) offered by Qiagen (former CLC bio). While many projects have been successfully delivered to our clients using the current framework, the variant analysis field has not stopped innovating and has produced several novel algorithms.
At BaseClear, we are committed to offer results that meet the highest standards in the field. Therefore, based on scientific reviews by independent research groups, our experts recently selected several state-of-the-art variant detection pipelines and compared them to our current framework. The results supported the quality of our current framework but also highlighted rooms for improvement. We are proud to present you our renewed variant calling service, which is based on the BBMap1 aligner and FreeBayes2 variant caller. The pipeline will be effective as of June 1st 2019 and will become our new standard for genome analyses, including low-frequency variant detection.
Variant detection at a glance
Variant detection using next-generation sequencing generally includes the following steps:
- Alignment of NGS reads to one or more references
- Detection of single or multiple nucleotide variants (SNPs or MNVs)
- Detection of short insertions and deletions (InDels)
- Annotation of variants based on the reference (e.g. gene location, amino acid change, etc.)
Figure 1. Graphical representation of a variant calling experiment.
Major changes in BaseClear’s new variant detection analysis
The main differences compared to the current analysis are the following:
- Change of the reference alignment algorithm (from CGW to BBMap)
- Change of the variant calling algorithm (from CGWto FreeBayes)
- Inclusion of a novel variant annotation tool (SnpEff 3)
- Emphasis on more comprehensive reports and output tables
Validation of the results
The new pipeline was tested on an extensive number of microbial genomic datasets including both in silico simulated data and real-life sequencing data from well-characterized microbial strains with references available. Amongst the species included were Escherichia coli, Streptococcus mutans, Staphylococcus aureus, Lactobacillus casei, Bordetella pertussis, Bacillus cereus, and Kluyveromyces lactis. The overlaps between the CGW-based workflow and the new variant detection pipeline based on BBMap and FreeBayes are generally more than 90%, as shown in the Venn-diagrams in Figure 2.
Figure 2. Results of the comparative analysis between the CGW- and FreeBayes-based workflow. Unique variants are shown in red for FreeBayes and in green for CGW .
In-depth investigations of the variants that are uniquely called by either method revealed that the FreeBayes-based workflow has more stringent requirements for the quality of the alignment of the sequence reads to the reference. For instance, variants are not called on genomic positions that are supported by ambiguous mappings (reads that can align to multiple genomic loci), or low local alignment scores. This higher stringency results in a higher predictive value of the variants called. Several evaluations of variant calling algorithms are available in the scientific literature, such as Sandmann et al. (2017)4.
Deliverables of the renewed variant detection service
- Alignment files and indices (in BAM BAI formats)
- List of variants (in VCF format)
- Consensus sequence (in FASTA format)
- Variants effect table (in CSV, i.e. Excel, format)
- Reports and statistics (in PDF format)
Figure 3. An example of the variant effects table which is an excellent basis for functional investigation of the predicted variants.
1 B. Bushnell (2014) BBMap: A Fast, Accurate, Splice-Aware Aligner.
2 E. Garrison, G. Marth (2012) Haplotype-based variant detection from short-read sequencing.
3 P. Cingolani et al. (2011) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff.
4 S. Sandmann et al. (2017) Evaluating variant calling tools for non-matched next-generation sequencing data.
For more information and questions please contact our sales support via our contact form.