May 6, 2022

Collaborative Development of Microbiome Apps from ENIGMA and KBase

KBase has been working with the Ecosystems and Networks Integrated with Genes and Molecular Assemblies (ENIGMA) project to develop apps that support both our project missions and our user community. The ENIGMA project is managed by Paul D. Adams and led by Adam P. Arkin at Lawrence Berkeley National Laboratory (LBNL) and aims to advance knowledge of microbial biology and better understand effects of microbial communities on constituent organisms and abiotic factors around them across multiple scales. ENIGMA studies everything from single molecules to ecosystems at large to unlock how they are interconnected. We’re highlighting a few of the apps created by the efforts of the ENIGMA SFA team that enable KBase users to perform taxonomic and functional profiling of microbiome data on system.

Functional Profiling with Fama

KBase has three apps for performing functional profiling using the Fama computational toolkit: 1) Run Fama Reads Profiling, 2) Run Fama Genome Profiling, and 3) View Fama Functional Profile.The first two apps append functional information to your reads, assemblies, or genomes in KBase from curated databases of proteins. View Fama Functional Profile then creates an interactive display of the functional characteristics of your genomes or microbial communities. Fama runs a similarity search for all predicted proteins in a genome using fast aligner DIAMOND and customized databases of reference proteins. Reference data includes three reference datasets:

  1. nitrogen cycle enzymes dataset for functional and taxonomic profiling of nitrate/nitrite/ammonia metabolic genes
  2. 30 families of universal single-copy marker proteins from complete bacterial and archaeal genomes for taxonomic profiling and check for contamination
  3. Ribosomal protein L6 sequences from genomes of cultivated bacteria and metagenome-assembled genomes for fast taxonomic profiling.
View the Fama Demonstration Narrative to learn more about these apps.

StrainFinder

Find Strain Genomes with StrainFinder v1 allows KBase users to identify haplotypes associated with particular strains within their microbial genomes. The app takes a reads library and reference genome, then aligns the reads to the reference genome to find single nucleotide polymorphisms (SNPs) in the reads. Next, it uses maximum likelihood estimates for the strain genotypes and the corresponding strain frequencies based on the frequencies of each SNP to obtain haplotypes from the sequencing data. A GenomeSet is produced containing the strain genomes. The inferred relative abundance of each strain can be found in the Summary of the output report. In addition to the summary report, BAM and VCF files are available for download alongside the sequences in the Genome object.

MetaDecoder Apps

Our team has also ported two apps from the MetaDecoder program for identifying polymorphisms within microbiome read sequences against reference genomes. Use the Map Reads to Reference Sequence app to map sequencing reads to a reference genome. You can also use Call Microbial SNPs to identify SNP variants based on your mapped reads. The output heatmap displays the distribution of genotypes/strains across the input reads libraries, assuming each reads library represents a single metagenome sample.

Webinar Recording: Functional and Taxonomic Profiling of MAGs

This recent webinar demonstrates using these tools for functional and taxonomic profiling of metagenome-assembled genomes (MAGs). The presenters, John-Marc Chandonia and Alexey Kazakov from Lawrence Berkeley National Lab and An-Ni Zhang from the Massachusetts Institute of Technology, helped develop these programs and implement them as KBase apps.

An-Ni Zhang (MIT)

An-Ni Zhang

 

 

 

 

Alexey Kazakov (LBNL)

Alexey Kazakov

Ben Allen
Ben Allen

Ben Allen coordinates outreach and user development activities to build the KBase user community while engaging in scientific collaborations to advance the use of the platform. His background in biochemistry and science education helps him develop protocols and training materials that provide depth while being accessible to a wide audience. Research interests include systems biology, microbial ecology, bioremediation studies, and biology education.

John-Marc Chandonia
John-Marc Chandonia

John-Marc Chandonia is a computational biologist at Berkeley National Lab.  He co-leads data management for the ENIGMA (Ecosystems and Networks Integrated with Genes and Molecular Assemblies) SFA, a multi-lab collaboration to study microbial ecology.  Chandonia is the creator and curator of the SCOPe (Structural Classification of Proteins — extended) database, which seeks to annotate the structural and evolutionary relationships between all proteins of known structure.  He also is a developer on the DOE Systems Biology Knowledgebase (KBase) project.  Chandonia is currently developing an efficient, scalable framework for organizing heterogeneous datasets in a way that maximizes adherence to the FAIR principles (Findability, Accessibility, Interoperability, and Reusability).

Dylan Chivian
Dylan Chivian
Lawrence Berkeley National Laboratory

Research Interests

* Computational infrastructure development for analysis of microbial community functional structure, primarily using sequencing data.

* Phylogenomic approaches for functional dissection.

* Microbial community substructure such as interaction networks of minimal viable functional cohorts.

* Principles of functional guild dynamics and the determination of whether rare species contribute to community phenotype, stability, and efficiency.

* Development of lab consortia model systems.

* Modeling of microbial community population, physiological state, and genetic adaptation in response to physical, chemical, genetic, and species perturbation.

* Manipulation of natural enzymes by structural design to engender new behaviors.

* Much of this work involves developing infrastructure to support such investigations, including the Robetta protein structure prediction server (www.robetta.org), the Genome-Linked Application for Metabolic Maps (GLAMM) metabolic network viewer (glamm.lbl.gov), the MicrobesOnline (www.microbesonline.org) and metaMicrobesOnline (meta.microbesonline.org) phylogenomic analysis platforms for microbes and microbial communities, and most recently the DOE Systems Biology Knowledgebase (kbase.us) project.