May 6, 2022

Collaborative Development of Microbiome Apps from ENIGMA and KBase

KBase has been working with the Ecosystems and Networks Integrated with Genes and Molecular Assemblies (ENIGMA) project to develop apps that support both our project missions and our user community. The ENIGMA project is managed by Paul D. Adams and led by Adam P. Arkin at Lawrence Berkeley National Laboratory (LBNL) and aims to advance knowledge of microbial biology and better understand effects of microbial communities on constituent organisms and abiotic factors around them across multiple scales. ENIGMA studies everything from single molecules to ecosystems at large to unlock how they are interconnected. We’re highlighting a few of the apps created by the efforts of the ENIGMA SFA team that enable KBase users to perform taxonomic and functional profiling of microbiome data on system.

Functional Profiling with Fama

KBase has three apps for performing functional profiling using the Fama computational toolkit: 1) Run Fama Reads Profiling, 2) Run Fama Genome Profiling, and 3) View Fama Functional Profile.The first two apps append functional information to your reads, assemblies, or genomes in KBase from curated databases of proteins. View Fama Functional Profile then creates an interactive display of the functional characteristics of your genomes or microbial communities. Fama runs a similarity search for all predicted proteins in a genome using fast aligner DIAMOND and customized databases of reference proteins. Reference data includes three reference datasets:

nitrogen cycle enzymes dataset for functional and taxonomic profiling of nitrate/nitrite/ammonia metabolic genes
30 families of universal single-copy marker proteins from complete bacterial and archaeal genomes for taxonomic profiling and check for contamination
Ribosomal protein L6 sequences from genomes of cultivated bacteria and metagenome-assembled genomes for fast taxonomic profiling.

View the Fama Demonstration Narrative to learn more about these apps.

StrainFinder

Find Strain Genomes with StrainFinder v1 allows KBase users to identify haplotypes associated with particular strains within their microbial genomes. The app takes a reads library and reference genome, then aligns the reads to the reference genome to find single nucleotide polymorphisms (SNPs) in the reads. Next, it uses maximum likelihood estimates for the strain genotypes and the corresponding strain frequencies based on the frequencies of each SNP to obtain haplotypes from the sequencing data. A GenomeSet is produced containing the strain genomes. The inferred relative abundance of each strain can be found in the Summary of the output report. In addition to the summary report, BAM and VCF files are available for download alongside the sequences in the Genome object.

Strainfinder Bitbucket Repository

MetaDecoder Apps

Our team has also ported two apps from the MetaDecoder program for identifying polymorphisms within microbiome read sequences against reference genomes. Use the Map Reads to Reference Sequence app to map sequencing reads to a reference genome. You can also use Call Microbial SNPs to identify SNP variants based on your mapped reads. The output heatmap displays the distribution of genotypes/strains across the input reads libraries, assuming each reads library represents a single metagenome sample.

Webinar Recording: Functional and Taxonomic Profiling of MAGs

This recent webinar demonstrates using these tools for functional and taxonomic profiling of metagenome-assembled genomes (MAGs). The presenters, John-Marc Chandonia and Alexey Kazakov from Lawrence Berkeley National Lab and An-Ni Zhang from the Massachusetts Institute of Technology, helped develop these programs and implement them as KBase apps.