GENOME ASSEMBLY AND ANNOTATION
BaseClear offers a wide range of genome analysis services. Our specialists are happy to discuss state-of-the-art bioinformatics solutions that best fit to your needs and budget. Standard bioinformatics services include microbial genome assembly, functional annotation, microbial profiling, metagenomics analysis and genome comparison. Additionally, custom analyses are routinely carried out by our highly-skilled bioinformatics team. Analyses are commonly offered in combination with our next-generation sequencing services, although we are happy to assist you with your bioinformatics-only projects. Our main drive is to make sure your research questions are answered in the best possible manner. We aim to present the results in a transparent way and are always open to discuss in depth the outcomes together with our clients!
A schematic overview of a genome analysis workflow: short and/or long sequence reads (e.g. Illumina, PacBio or ONT) are assembled into a finished genome followed by in-depth annotation. The annotated genomes are imported into our cloud-based Genome Explorer app which allows the customer to interactively mine and compare genomes.
Our bioinformatics department has developed state-of-the-art assembly pipelines which follow either a reference-based or a de novo approach (and in some cases a combination of both). The ultimate goal is to provide our customers with a high-quality finished genome sequence. To accomplish this, we offer different sequencing approaches including Illumina HiSeq/MiSeq, PacBio RS II/Sequel and ONT MinION platforms. The resulting reads are optimally merged into finished genomes using tools that are recognized in the field for their outstanding quality. Among these are the highly cited packages such as SPAdes (Bankevich et al., 2012) for draft assembly of short reads and HINGE (Kamath et. al, 2017) for long-read assembly. Our internally developed packages SSPACE (Boetzer et al., 2011) and GapFiller (Boetzer and Pirovano, 2012) are key tools to finish genomes in a very accurate manner. The resulting assemblies are subject to an extensive quality assessment procedure, which includes error correction and structural improvement of the contigs. Manual interpretation and genome closure are also offered. In this way we can guarantee that our customers receive exceptionally accurate and complete assemblies at a very sharp price!
BaseClear uses a mix of different state-of-the-art software packages to assemble short and/or long reads into high-quality (and finished) genomes. In the left panel a schematic overview of our short-read assembly approach is displayed, whereas on the right an example of genome tools used to simplify and solve assembly graphs is shown (Wick et. al, 2015).
- Assembled contig and scaffold sequences in FastA format.
- Accession Golden Path (AGP) file describing the linkage between contigs in the scaffolds.
- De novo assembly report containing a summary of the assembly results and quality statistics.
Although an assembly can provide insight into the general architecture of a genome, gene annotations are the key to elucidate the functionalities of microbial organisms. Annotations at BaseClear are defined in two steps, which are the structural and functional annotation. A structural annotation implies the prediction of open read frames (ORFs) and – in the case of eukaryotes – the prediction of the correct intron-exon structure. For bacteria we use the Prodigal software (Hyatt et. al, 2010) to find bacterial and archaeal genes, whereas for eukaryotes (mainly yeast and fungi) we use the Augustus software (Stanke et. al, 2003) to predict genes but also the correct model. In the latter case RNA-Seq expression data can be added to enhance the determination of alternative transcripts/splicing.
Subsequently we assign functional annotations to genes, tRNAs and rRNAs, but also predict i.e. Signal peptides. Our service includes an extensive search of the most commonly used functional databases among which SwissProt/UniProt, EC-enzyme, GeneOntology, CAZy and Pfam. Also KEGG identifiers are returned for each predicted gene. The standard output includes GenBank and GFF files which are compliant with the NCBI genome submission standards. Nonetheless other formats can be easily provided upon request. The output files can be easily imported in any third-party genome browser, but we highly recommend our customers apply for a trial license for the Genome Explorer (https://genome-explorer.com), which is our tool of choice for (advanced) genome mining and comparative genomics.
The BaseClear Genome Explorer is an online portal that allows interactive genome viewing and mining features. Gene structures are displayed in a comprehensive (graphical) manner and can be easily compared. The software is implemented in Microsoft Azure and available through https://genome-explorer.com.
- Table containing full annotation for predicted coding sequence regions.
- GenBank and GFF annotation formats.
- Extended annotation report containing a summary of the assembly results and quality measures.
- Access to the BaseClear Genome Explorer which allows interactive analyses of the results