A knowledgebase for predictive biology

A knowledgebase for predictive biology

KBase enables users to analyze, share, and collaborate using data and tools designed to help build increasingly realistic models for biological function.

What is KBase? What is KBase?

What is KBase?

The Department of Energy Systems Biology Knowledgebase (KBase) is a knowledge creation and discovery environment designed for biologists and bioinformaticians. KBase integrates a variety of data and analysis tools, from DOE and other public resources, into an easy-to-use platform that leverages scalable computing infrastructure to perform sophisticated systems biology analyses. The platform is a freely available and developer extensible platform where scientists can analyze their own data within the context of public data, using a variety of open-source bioinformatics apps to power their workflows. The collaborative Narrative interface provides a digital notebook the allows users to research together and publish their analysis, data, and code with a persistent link and DOI identifier to support reproducibility.

KBase is funded by the DOE Biological and Environmental Research (BER) program. DOE BER sponsors user facilities and resources such as the Joint Genome Institute (JGI), the Environmental Molecular Sciences Laboratory (EMSL), the National Microbiome Data Collaborative (NMDC), and Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE).

KBase is part of the by Department of Energy in the Biological and Environmental Research research program that sponsors user facilities and resources such as BER (BER Structural Biology Resources), JGI (Joint Genome Institute - user facility for sequencing and advance genomic research), EMSL (Environmental Molecular Sciences Laboratory - user facility, space, expertise, equipment), NMDC (National Microcrobiome Data Collaborative - microbiome metadata standards and standardized workflows), ESS-DIVE (Environmental System Science Data Infrastructure for a Virtual Ecosystem - data repository for earth and environmental sciences). KBase - part of the data analysis and visualization and publication…

KBase integrates a variety of data and analysis tools from the DOE and other public services into an easy-to-use platform that leverages scalable computing infrastructure and performs sophisticated systems biology analyses.

As a freely available and developer extensible platform, KBase enables scientists to analyze their own data within the context of public data and share findings across the system. Users can perform large-scale analyses that combine multiple types of ‘omics data to investigate organisms and their communities and drive discovery.

Data Sharing

KBase supports the sharing of data, workflows, and Narratives, facilitating collaboration and accelerating the pace of scientific discovery. The ultimate goal is to build a true knowledgebase for systems biology: an integrated environment where knowledge and insights are created and multiplied.

Why KBase?

Why KBase?

KBase is the first large-scale data science platform that enables users to upload their own data, analyze it alongside collaborator and public data, build increasingly realistic models, and share and publish reproducible workflows and conclusions.

KBase integrates data and tools in a unified graphical interface. Users no longer need to do analyses across multiple systems in order to create and run sophisticated systems biology workflows. KBase allows users to perform large-scale analyses and combine multiple lines of evidence to model plant and microbial physiology and community dynamics.

Read about KBase in Nature Biotechnology

Read about KBase in Nature Biotechnology

For a comprehensive overview of KBase and its scientific impact, this publication details the unique features and infrastructure of the KBase platform through several scientific use cases.

Please cite: Arkin AP, Cottingham RW, Henry CS, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
Meet the KBase Team

Meet the KBase Team

KBase is run by an interdisciplinary and collaborative team led by Lawrence Berkeley National Laboratory with participation from Argonne, Brookhaven, and Oak Ridge National Laboratories.

Also involved in the multi-institutional program are Cold Spring Harbor Laboratory, the University of Illinois at Urbana-Champaign, and the University of Tennessee.

Our key external partners are DOE’s Joint Genome Institute, Environmental Molecular Sciences Laboratory, Bioenergy Research Centers, and several of the Genomic Science Program’s Scientific Focus Areas (SFAs). Several university projects are also important contributors.

Members of the DOE Energy Systems Biology Knowledgebase (KBase) hold their annual All-Hands meeting where the group discusses past successes, issues that need to be resolved, and plan for future projects that they will undertake, at the UC Berkeley Botanical Gardens, Berkeley, California, 02/22/2023. Photo credit: Thor Swift

Funding: KBase is supported as part of the Genomic Sciences Program funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (Award Numbers DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725, and DE-AC02-98CH10886).

Meet the KBase Team
Adam Arkin
Adam Arkin | Lead Pl
Lawrence Berkeley National Laboratory

Adam is an expert in the comparative systems and synthetic biology of microbes and is dedicated to a model-driven approach to experimental science. He is a senior faculty scientist in the Environmental Genomics and Systems Biology Division at the Lawrence Berkeley National Laboratory and he is the Dean A. Richard Newton Memorial Professor of Bioengineering at the University of California, Berkeley where he has been since 1998. He is Technical Co-Manager of the ENIGMA SFA and directs the Center for Utilization of Biological Engineering in Space. He was one of six recipients of the 2013 Ernest Orlando Lawrence Award, the Department of Energy’s highest scientific honor.

Chris Henry
Chris Henry | Pl
Argonne National Laboratory

Chris is a scientist at Argonne National Laboratory, a fellow at the University of Chicago, and an adjunct professor at Northwestern University. He is an expert in computational biology with a focus on the prediction of phenotype from genome through the use of comparative genomics, metabolic modeling, and dynamic cellular community models. He received the Jay Bailey Young Investigator Best Paper in Metabolic Engineering Award in 2012.

Bob Cottingham
Bob Cottingham | Pl
Oak Ridge National Laboratory

Bob has extensive experience developing computational and data management tools and systems for genetics, genomics and systems biology research with a background in bioinformatics and management including at the Baylor College of Medicine Human Genome Center as Co-Director of the Informatics Core, Operations Director of the Genome Database at Johns Hopkins University School of Medicine, and Vice President of Computing at Celltech Chiroscience, a UK biopharmaceutical company developing drugs based on gene targets. In 2008 Cottingham moved to Oak Ridge National Laboratory where he is Group Leader for Computational & Predictive Biology.

Elisha Wood-Charlson
Elisha Wood-Charlson | Engagement Lead
Lawrence Berkeley National Laboratory

Elisha M Wood-Charlson is KBase’s User Engagement Lead. She has a PhD and 10+ years of experience as a microbial ecologist focused on host-microbe-virus interactions in the marine environment. Since leaving the research bench, she has moved into the realm of scientific community engagement, with the goal of making microbiome data science more efficient through effective collaboration, building trust in online communities, and developing shared ownership throughout the scientific process.

Paramvir Dehal
Paramvir Dehal | Science Lead
Lawrence Berkelely National Laboratory
Gazi Mahmud
Gazi Mahmud | Architect Lead
Lawrence Berkeley National Laboratory

Gazi Mahmud is seasoned professional with two decades of extensive industry expertise, specializing in the dynamic realms of Big Data and Enterprise Architecture. His core focus lies in seamlessly integrating Data Engineering, Data Science, and modern ML/AI Engineering operations at scale, reflecting his adeptness in orchestrating cross-functional collaborations across diverse domains.

Gazi possesses hands-on experience and visionary insights in design and implementation of data-led organizational transformations, encompassing data architecture modernization, ML/AI integration, and enterprise data governance. Notably, Gazi worked across innovative technology startups and larger high tech companies delivering compelling data narratives that not only foster AI ethics but also champion model explainability within the state of the art Data Science workflow initiatives. 

Leading by example, he has navigated expansive projects aimed at crafting innovative data and AI transformation strategies for industry leaders. His expertise extends to implementing continuous delivery capabilities, underscored by meticulous observability, to ensure seamless execution of unified data and analytics platform modernization use cases.

Oliver Fiehn
Oliver Fiehn | Director West Coast Metabolomics Center
UC Davis Genome Center

Prof. Oliver Fiehn has pioneered developments and applications in metabolomics with over 220 publications to date, starting in 1998 as postdoctoral scholar and from 2000 onwards as group leader at the Max-Planck Institute for Molecular Plant Physiology in Potsdam, Germany. Since 2004 he is Professor at the UC Davis Genome Center, overseeing his research laboratory and the satellite core service laboratory in metabolomics research. Since 2012, he serves as Director of the NIH West Coast Metabolomics Center, supervising 35 staff operating 16 mass spectrometers and coordinating activities with three UC Davis satellite labs, including efforts for combined interpretation of genomics and metabolomics data. The West Coast Metabolomics Center provides the most extensive and most in-depth analysis of metabolites available today, using a range of validated protocols for fee-for-service projects and scientific collaborations.

Professor Fiehn’s research aims at understanding metabolism on a comprehensive level in human population cohorts, animal and plant models, and cells and microorganisms. In order to leverage data from these diverse sets of biological systems, his research laboratory focuses on standardizing metabolomic reports and establishing metabolomic databases and libraries, for example the MassBank of North America that hosts over 200,000 public metabolite mass spectra and BinBase, a resource of over 90,000 samples covering more than 1,900 studies. Professor Fiehn’s laboratory members develop and implement new approaches and technologies in analytical chemistry for covering the metabolome, from increasing peak capacity by ion mobility to compound identifications through cheminformatics workflows and software. He collaborates with a range of investigators for interpreting metabolomic data in human diseases through statistics, text mining and pathway-based mapping efforts. He also studies fundamental biochemical questions from metabolite damage repair to the new concept of epimetabolites, the chemical transformation of primary metabolites that gain regulatory functions in cells.

For his work, Professor Fiehn has received a range of awards including the 2014 Molecular & Cellular Proteomics Lecture Award and the 2014 Metabolomics Society Lifetime Achievement Award. He served on the Board of Directors of the Metabolomics Society from 2005-2010 and 2012-2015, organizing a range of workshops and conferences, including the 2011 Asilomar metabolomics meeting and the 2015 Metabolomics Society international conference in San Francisco that reached a record of over 1,000 participants.

Melissa A. Haendel
Melissa A. Haendel | Director of Translational Data Science
Linus Pauling Institute

My vision is to fundamentally alter the fabric of biomedical science, utilizing my art as a data translator to weave together healthcare systems, basic science research, and patients; through development of data integration technologies, innovative communication strategies, and collaborative education and outreach.

My demonstrated success in leadership of cross- ­disciplinary international teams, development of applications used for rare disease diagnostics, implementation of platforms and tools for translational research, and open and reproducible science will serve me and the community at large to effect real change.

Stephen P. Long
Stephen P. Long | Professor
University of Illinois

Dr. Long’s research bioengineers the photosynthesis process in crops to achieve higher productivity, sustainability, and adaptation to climate change. He heads an international project to improve the crops that feed many of the poorest in the world, which has led to the discovery of a way to engineer photosynthesis that resulted in a 20% increase in crop productivity.

Lee Ann McCue
Lee Ann McCue | Computational Scientist
Pacific Northwest National Laboratory

Dr. McCue’s early research focused on the development and application of comparative genomics methods for studies of transcription regulation in bacteria. Her research pioneered the phylogenetic footprinting approach for predicting transcription factor binding sites de novo using multiple bacterial genomes, and contributed to a number of algorithmic advances to the Gibbs sampling technique for multiple sequence alignment. Her research at PNNL has expanded to include the analysis of microbial populations and metagenomes for microbial ecology studies, and the application of high performance computing techniques to handle the quantity of data being generated by current sequencing technologies.

Nirav Merchant
Nirav Merchant | Director, UA Data Science Institute
University of Arizona

Nirav Merchant is the Co-PI for NSF CyVerse(link is external) a national scale Cyberinfrastructure for life sciences and (link is external)NSF Jetstream(link is external) the first user-friendly, scalable cloud environment for NSF XSEDE.

He received his undergraduate degree in Industrial engineering from the University of Pune, India, and graduate degree in Systems and Industrial Engineering from the University of Arizona (1994).

Over the last two decades his research has been directed towards developing scalable computational platforms for supporting open science and open innovation, with emphasis on improving research productivity for geographically distributed interdisciplinary teams.

His interests include data science literacy, large-scale data management platforms, data delivery technologies, managed sensor and mobile platforms for health interventions, workforce development, and project based learning.

Julie C Mitchell
Julie C Mitchell | Director of Biosciences
Oak Ridge National Laboratory

Julie Mitchell is Director of the Biosciences Division at Oak Ridge National Laboratory. She has over 20 years of experience in working at the interface of quantitative and biological sciences. Mitchell’s research has focused on projects at the interface of biochemistry, data science, and high-performance computing. Her contributions to the field of computational biophysics emphasize the use of machine learning in predictive models for molecular interactions. Mitchell’s group has produced a widely utilized web server for protein-protein interaction hot spots (>80k jobs), many well-cited publications and two patents. She collaborates on ORNL projects related to protein intrinsic disorder, small molecule screening algorithms, and vaccine design.

Prior to joining ORNL, Mitchell worked as a professor of mathematics and biochemistry at the University of Wisconsin and as a principal scientist at the San Diego Supercomputer Center at UCSD. Mitchell earned a Ph.D. in mathematics at the University of California at Berkeley, and a B.A. in mathematics at San Jose State University. Mitchell was a Sloan Foundation Fellow, La Jolla Interfaces in Science Fellow and ARCS Foundation Fellow during her faculty, postdoctoral and graduate years, respectively.

Daniel Segrè
Daniel Segrè | Professor
Boston University

We develop theoretical approaches and computational models for the study of complex biological networks. We are especially interested in the dynamics and evolution of metabolism, whose complex web of small-molecule transformations underlies fundamental aspects of biological organization, from energy transduction to cell-cell communication. In addition to helping understand how biological systems function and evolve, we seek to apply our methods to the design and optimization of engineered networks for bioenergy and biomedicine applications.

Rick Stevens
Rick Stevens | Associate Laboratory Director
Argonne National Laboratory

Rick Stevens is Argonne’s Associate Laboratory Director for Computing, Environment and Life Sciences.

Stevens has been at Argonne since 1982, and has served as director of the Mathematics and Computer Science Division and also as Acting Associate Laboratory Director for Physical, Biological and Computing Sciences. He is currently leader of Argonne’s Exascale Computing Initiative, and a Professor of Computer Science at the University of Chicago Physical Sciences Collegiate Division. From 20002004, Stevens served as Director of the National Science Foundation’s TeraGrid Project and from 19972001 as Chief Architect for the National Computational Science Alliance.

Stevens is interested in the development of innovative tools and techniques that enable computational scientists to solve important large-scale problems effectively on advanced scientific computers. Specifically, his research focuses on three principal areas: advanced collaboration and visualization environments, high-performance computer architectures (including Grids) and computational problems in the life sciences. In addition to his research work, Stevens teaches courses on computer architecture, collaboration technology, virtual reality, parallel computing and computational science.

Susannah Tringe
Susannah Tringe | Division Director of Environmental Genomics & Systems Biology
Lawerence Berkeley National Laboratory

Dr. Tringe joined the JGI in 2003 as a postdoctoral fellow in Eddy Rubin’s group. During her postdoctoral tenure she developed methods for comparative analysis of metagenome data from complex microbial communities. In 2006 she took a research scientist position providing scientific support for the developing portfolio of collaborator metagenome projects, and in 2010 she became head of the Metagenome Program. Dr. Tringe also heads the Microbial Systems Group, whose work focuses on sequence-based approaches to studying microbial community assembly, function and dynamics. Major foci of these research efforts are the roles of microbial communities in wetland carbon cycling and the interactions of plants with their associated microbiomes. Dr. Tringe serves as the Division Director of Environmental Genomics & Systems Biology at Berkeley Lab.

Kelly Wrighton
Kelly Wrighton | Associate Professor of Soil Microbiomes
Colorado State University

The Wrighton laboratory is a microbiome research group interested in the study of microorganisms, their genomes, and the surrounding environment. We investigate how microorganisms contribute to ecosystem processes, with a particular interest in carbon and nitrogen cycling. Our microbial research has many applications, including improving predictions of greenhouse gas emission from soils, stabilizing gastrointestinal and heart health, and enhancing energy yield and longevity from hydrocarbon systems. We integrate data from different scales, from the metabolite through genes/enzymes, to organisms, and ultimately microbial communities, to better understand microbial interactions with the biotic and abiotic environment. Please check out our laboratory webpage for more detailed information on our current research.

Looking for information on tools and resources?

Looking for information on tools and resources?

Check out KBase Documentation for our Getting Started guide and information on tools in the App Catalog. KBase is a fully open source software project available on GitHub.