Today, host- and environment-associated microbial communities like those living in the human gut, skin, soil, and water are more and more studied in terms of their functional capacity and activity. For the latter, metatranscriptomics is the method for profiling active microbial functions. Like in metagenomics, metatranscriptome data can be analysed using database alignment- or assembly-based methods, yet both approaches come with disadvantages. While database-dependent alignment-based methods leave many NGS reads uncharacterised and cannot detect novel microbial functions, assembly-based methods suffer from long runtimes, assembly errors, and difficulties in reconstructing genes with low expression.
Figure 1: The two algorithmic approaches used in metatranscriptomics pipelines with their advantages and disadvantages.
To address these disadvantages, we developed a hybrid metatranscriptomics workflow that combines the best of both worlds: an alignment- and an assembly-based method are incorporated into a novel, comprehensive, and generalizable pipeline. First, the reads are quality checked and aligned to nucleotide and protein databases of coding sequences. Those reads that did not align to the databases are assembled into transcripts, which are used to predict and annotate microbial genes. These genes are then used to profile microbial functions using KEGG and COG databases. The results from the alignment- and assembly-based steps are merged into tables that provide the user with a complete picture of the microbial functions and their expression levels in the sample.
Figure 2: BaseClear’s hybrid metatranscriptomics pipeline.
We applied and evaluated our workflow on a diverse set of publicly available datasets and compared the results with those from a mapping-based pipeline. We show that our approach functionally characterises a significantly larger portion of reads. In addition, thanks the assembly-based gene prediction, the results can be used to study novel functions that are otherwise undetectable.
Figure 3: Metatranscriptome read alignment rates obtained using a mapping-only method (A) and the hybrid (mapping and assembly-based) BaseClear workflow (B) across samples from multiple domains. Colours blue and orange represent the portion of reads characterised using nucleotide and protein databases, respectively. Green shows the ratio of unaligned reads. The BaseClear pipeline characterises a large portion of reads using the genes (in red and purple) derived from its assembly step.
Get in touch with our specialists and get the most out of your microbiome analysis now!
More information on our metatranscriptomics service, click here.