Sep 20, 2021

New suite of features enables analysis of amplicon and geochemistry data

Samples Service Overview

We’re releasing a suite of features that will allow researchers to integrate their biological and geochemistry data to gain new insights about the relationship between microorganisms and their environment. You can now import environmental sampling data into KBase in ways that allow analysis of biogeochemistry and diversity. These improvements capture all 3 – import of sample metadata, biogeochemistry derived from those samples, as well as amplicon-based sequence data. With these new data types and enhanced functionality, you can profile taxonomy from amplicon data and correlate geochemistry with OTU abundance across all of your sampling conditions.

Import Samples into KBase

Import Samples into KBase

New importers allow you to add amplicon sequencing data as an AmpliconMatrix, environmental chemistry tables as a Chemical Abundance Matrix, and tables of conditions from your experimental samples as a SampleSet into your KBase Narratives. Think of a SampleSet as the foundation for your experimental data, used for organizing and contextualizing your experiments, and establishing relationships to additional layers of biological data. For this initial release, the KBase team designed the foundation to be compatible with public data infrastructure projects serving environmental sciences research communities. If you have data in the System for Earth Sample Registration (SESAR) or Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), you should be able to upload your experimental data seamlessly with our importers tailored for these table formats. Once imported, you can view geolocation, collection information, and experimental metadata, as well as linkages to data in KBase. 

SampleSet Data Viewer

SampleSet data viewer

We are also releasing new apps that use AmpliconMatrix, Chemical Abundance Matrix, and SampleSet data for inferring relationships and drawing conclusions about your environmental sample. Generate Taxonomy Abundance Barplot computes and visualizes taxonomic abundances of OTUs from your amplicon data. Use the Compare Correlation Matrices app to correlate chemistry from an environmental sample with your OTU abundance profile.

Correlating OTU abundance to environmental chemistry

Correlating OTU abundance to environmental chemistry

These new features and apps represent the big first step towards the full breadth of analytical capabilities we are developing for exploring biological and environmental data in KBase. Combined, these advancements make a strong foundation for performing robust systems biology analysis across heterogeneous data using the KBase infrastructure and platform. We’ll be hosting webinars over the coming weeks to introduce and demonstrate these new features to the KBase community. Sign up using the links below:

Interested in helping us design and test new features? Sign up to become a KBase tester! If you have questions or requests about any of these new features, please join our Help Board and post a ticket: http://kbase.us/support

Added features:

  • New Data Type: SampleSet
  • New Data Type: AmpliconMatrix
  • New Importer: Import Samples
  • New Importer: Import Amplicon Matrix from TSV/FASTA File in Staging Area
  • New Importer: Import Chemical Abundance Matrix from CSV/Excel/TSV File in Staging Area
  • New App: Generate Taxonomy Abundance Barplot
  • New App: Compare Correlation Matrices

 

Ben Allen
Ben Allen

Ben Allen coordinates outreach and user development activities to build the KBase user community while engaging in scientific collaborations to advance the use of the platform. His background in biochemistry and science education helps him develop protocols and training materials that provide depth while being accessible to a wide audience. Research interests include systems biology, microbial ecology, bioremediation studies, and biology education.

Paramvir Dehal
Paramvir Dehal | Science Lead
Lawrence Berkelely National Laboratory
Shane Canon
Shane Canon | Architect Lead
Lawrence Berkelely National Laboratory

Shane Canon is a project engineer in the Data and Analytics Services group at NERSC at Lawrence Berkeley National Lab and is a senior member of the KBase project where he co-leads advanced development.   Shane has focused his career on enabling data-intensive applications on HPC platforms and more recently on leverage HPC and large scale computing to enable bioinformatics. Shane has held a number of positions at NERSC including leading the Technology Integration group, where he focused on the Magellan Project and other areas of strategic focus, and leading the Data Systems group which managed the global file systems and other data systems.  Shane has also served as a group leader at Oak Ridge National Laboratory, where he architected the 10 petabyte Spider filesystem. Shane holds a PhD in physics from Duke University and a BS in physics from Auburn University. 

Zach Crockett
Zach Crockett

Zach Crockett is a member of the outreach, communications, and user development team. His background is in biochemistry and cellular biology. He has professional experience in medical lab science testing information management systems, creating training plans, improving processes, and developing standard operating procedures.

Pamela Weisenhorn
Pamela Weisenhorn
Janaka Edirisinghe
Janaka Edirisinghe

Janaka N. Edirisinghe is a Computational Biologist in Data Science and Learning Division at Argonne National Laboratory (ANL). He has inter-disciplinary background in the areas of System Biology, Microbial Physiology and Molecular Genetics. He got his BSc in Computer Science, MSc in Bioinformatics and Ph.D in Microbial Physiology. He has been an integral member of the ModelSEED team at ANL headed by Chris Henry and contributed to the implementation of automated model construction pipelines of prokaryotes and Fungi. He has joined the KBase project in its beginning days and has been contributing as a scientist, educator and as a developer. His research interests are focused on bacterial and fungal modeling, community interactions, use of cheminformatics in novel pathway identification and multi-omics integration.As an educator, he has conducted numerous hands-on workshops, webinars and presented at conferences over the years in distributing and sharing the knowledge among the scientific community. He can be found in Github, Google Scholar, LinkedIn, PubFacts

Sebastian Le Bras
Sebastian Le Bras
Lawrence Berkeley National Laboratory
Priya Ranjan
Priya Ranjan
Sean Jungbluth
Sean Jungbluth

Sean Jungbluth pursued a Ph.D. in Oceanography studying microbial life in the deep subseafloor. In the search for novel life in extreme environments, I used submarines and developed custom sampling equipment to extract rock-hosted fluids circulating hundreds of meters below the seafloor. Leveraging DNA sequencing techniques, self-taught coding skills, and supercomputers, I’ve forged a career making stories from DNA sequencing data and discovering novel microbial lineages. Currently, I am working at Lawrence Berkeley National Laboratory as a Data Scientist on the DOE Systems Biology Knowledge Base (KBase) project where my goals are to democratize access to bioinformatic analysis tools and support reproducible genomic science. For more information, check out my personal webpage, Github, or Google Scholar.