The first one thousand days in an infant’s life sets the stage for lifelong health and development. Colonisation of the infant digestive system with microbes is crucial for normal immune system development, and may affect health later in life. BaseClear has recently developed a new platform specialised in analysing the infant gut microbiome. This targeted analysis pipeline uses shallow shotgun sequencing, a customised infant microbiome database, the powerful taxonomic classification tool Kraken 2, and the Bracken statistical method to create abundance profiles down to the strain level, and across microbial kingdoms. Dr. Eline Klaassens (Product Manager Microbiome & Human Health) and Nienke Mekkes (Bioinformatician) at BaseClear discuss the pipeline and how it can help researchers and product developers in infant health and nutrition.


How is the infant microbiome pipeline different to others available to researchers?

“The new platform is a unique combination of technologies that will improve insights into the infant microbiome,” says Dr. Klaassens. The use of a infant-specific database improves both performance and the quality of the results. Shotgun metagenomics allows taxonomic profiling at a higher resolution than the more commonly used 16S technology, whilst expanding insights across kingdoms. Shallow shotgun sequencing requires only 0.5-1.0 GB data per sample and is more budget-friendly than deep whole-metagenome shotgun sequencing.

Key advantages of the new infant microbiome pipeline:

  • Strain-level identification
  • High proportion of classified reads
  • Fast
  • Multi-kingdom identification
  • Cost-effective
  • Abundance estimates for different taxonomic levels

Who can benefit most from using this pipeline?

The infant microbiome pipeline is tailored to research and development in early life nutrition and health, thus it is especially useful for the food and pharmaceutical industries. The ability to discern between strains is useful for the development of probiotics for infants as researchers will be able to track proprietary microbes in the infant gastrointestinal tract. Measuring and tracking changes in fungi, archaea and viruses, in addition to bacteria, opens up a new area of research in infant health. Clinical research applications such as the strain-specific metabolism of drugs by the gut microbiome or the effect of antibiotics on the infant gut microbiome can also be tested using the new pipeline.

Why is it important to differentiate between strains?

We’ve known for many years that differences between bacterial strains affect their potential to influence human health. For example, particular probiotic strains have been associated with protection from antibiotic-associated diarrhoea in children(1), while other E. coli strains can be considered either normal large intestine residents or pathogens(2). The subtle genetic differences between strains can affect how well the bacteria survive in the digestive tract, adhere to cells lining the intestines, or produce toxins. Strain-level identification allows researchers to “follow” particular strains, for example to ensure that a probiotic has colonised the digestive tract, or to track the transfer of strains from mother to infant. The ability to identify microorganisms down to the strain level increases the applicability of research, especially when working with proprietary strains.

Tell me more about the custom infant microbiome database and why it is an important part of the pipeline

A crucial part of the pipeline is the infant-specific database. This was put together by using the most up-to-date human gut genome catalogue(3), and augmented with an additional 50,000 strains of bacteria that have been found in the infant gut before. “Customers can add their own genomes to the database – in fact, they are encouraged to do so to make the best possible use of the tool!” advises bioinformatician Nienke Mekkes. As the field of metagenomics is rapidly evolving, future additions will further improve the database and increase its performance.

An infant-specific database improves both the speed and quality of processing the data. First of all, the database size is considerably smaller, which means that the time needed to classify reads is substantially reduced. Secondly, by excluding genomes that will not be found in infant microbiome samples, the proportion of false positive or partial matches is reduced, thus increasing the percentage of classified reads.

“Our new infant microbiome pipeline uses a dedicated infant microbiome database,” explains Dr Klaassens. “Our customers will get a higher proportion of classified reads, which improves the quality of the analysis. Results are more robust and reliable, and you can do more with them.”

How have you validated the pipeline?

The infant microbiome pipeline has been validated in several ways. First of all, simulated communities developed with CAMISIM(4) were used to test strain abundance compared to the actual input and compared to the results when using a standard database. Using the infant-specific database, all simulated strains were identified and the abundances found were much closer to the input than when using a standard database. Secondly, the pipeline has been tested using a human gut mock community sequenced at different sequencing depths at BaseClear. Compared to a standard database, the infant gut database was able to identify more strains and the relative abundance more closely matched the reference microbiome. As a third validation step, actual infant faecal microbiome sequencing data was used to test the system with real-world, highly complex data. As bioinformatician Nienke Mekkes says: “Data validation is very important to us. We want to ensure our customers have results that they can rely on.”

What are some examples of applications for the infant microbiome pipeline?

  • New-born infants acquire microbes from their environment. It would be useful to track how colonisation progresses, and classification down to strain level could improve our understanding of how microbes are transferred from the environment to infants.
  • Strain-level classification will allow confirmation that a probiotic bacterial strain added to infant formula reaches the large intestine.
  • Some diseases found in infants, such as necrotising enterocolitis in preterm infants, have microbial causes(5). Pathogenesis is linked with strain-level differences in bacteria, and viruses, both of which can be analysed using the new infant pipeline.

What are the key benefits of the infant microbiome pipeline?

A major benefit of the targeted infant microbiome pipeline is that the proportion of unclassified reads is low. This improves the quality of the results. The pipeline builds on existing tools and platforms, and results offer extra information while still being comparable to existing work. The new pipeline can detect low-abundance strains: while strains can be detected down to 0.1% abundance, more reliable results are found when the abundance is above 1%.

The strain-level classification of microbes offers unique opportunities to read more into microbiome changes that occur due to interventions, or to follow specific strains. “Our customers involved in infant nutrition and health will gain much better results using the infant-specific pipeline compared to a generic analysis. The strain-level insights give a much more complete picture of what is in each sample” says Dr Klaassens.

More information? We have prepared for you an infographic on strain level classification of the infant gut microbiome. Click on the button below to download.

DOWNLOAD infant microbiome infograpghic


  1. McFarland LV, Evans CT, Goldstein EJC (2018) Strain-Specificity and Disease-Specificity of Probiotic Efficacy: A Systematic Review and Meta-Analysis. Frontiers in Medicine 5.
  2. Donnenberg MS, Whittam TS (2001) Pathogenesis and evolution of virulence in enteropathogenic and enterohemorrhagic Escherichia coli. J Clin Invest 107, 539-548.
  3. Almeida A, Nayfach S, Boland M et al. (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39, 105-114.
  4. Fritz A, Hofmann P, Majda S et al. (2019) CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 17.
  5. Coggins SA, Wynn JL, Weitkamp JH (2015) Infectious causes of necrotizing enterocolitis. Clin Perinatol 42, 133-154, ix.


Convinced? Get in touch

Get a quoteMeet baseClearContact form