Data Policy & Sources

Data Policies

An update to this policy was made on October 2022, and is available at

As a community of data scientists (see KBase Code of Conduct), please be respectful and cite all data sources that contribute to your work. If the data are not published, contact the data generator about their policy for use of these data.

KBase conforms to the Information and Data Sharing Policy of the Genomic Science Program of the Office of Biological and Environmental Research within the Office of Science. This requires that all publishable data, metadata, and software resulting from research funded by the Genomic Science program must conform to community-recognized standard formats when they exist, be clearly attributable, and be deposited within a community-recognized public database(s) appropriate for the research.

Data publicly available in KBase comes from the sources listed on this page. Additionally, users can upload their own data to KBase to analyze it, and can choose how widely their data should be shared. All data uploaded by users is private to them unless they choose to share it. KBase does not release or use private data for any internal analyses, with the exception of collecting metrics on 1) amount of data and 2) the distribution of data types in system, and 3) app usage across system as an aggregate of user activity.


KBase does not guarantee long-term retention of user-uploaded data. Please take appropriate precautions in storing and backing up your data locally.


Improper use of KBase, including uploading human data, may result in the termination of KBase access privileges. Please see the Terms and Conditions page for more information.

Data Sources



Source License Download

Genomic Data

NCBI RefSeq Public Domain –- US Government FTP
PATRIC Public Domain –- US Government FTP
SEED Public Domain FTP
Phytozome Free (no embargoed/early release genomes) Download (login required)
MycoCosm Free (no embargoed/early release genomes) Download (login required)
Gramene Public Domain FTP
JGI Public Domain –- US Government HTTP (genome portal)
NMDC Creative Commons 4.0 with Attribution HTTP

Ribosomal Data

Greengenes Creative Commons Attribution-ShareAlike 3.0 Unported License HTTP
SILVA Academic/non-commercial HTTP
RDP Creative Commons Attribution-ShareAlike 3.0 Unported License HTTP


GO Creative Commons Attribution 4.0 Unported License HTTP
GSC Unknown HTTP

Pathway Data

KEGG maps Academic license required HTTP
ModelSEED Public Domain –- US Government HTTP

Protein Annotations

UniProt Public Domain HTTP
RAST Public Domain FTP


Protein Data Bank Public Domain HTTP/FTP