Jul 17, 2025

Community Highlight: FAIR Data at ASM Microbe

Dr. Ellen Dow (KBase, Lawrence Berkeley National Laboratory) and Dr. Vanja Klepac-Ceraj (Wellesley College) held a Lounge & Learn at the American Society for Microbiology Microbe Meeting on Saturday June 21 that covered:

The FAIR data principles:
- Findable – data given a registered and unique persistent identifier,
- Accessible – retrievable using a standard protocol,
- Interoperable – machine readable through widely used file formats, and
- Reusable – detailed provenance and processing information;
Why researchers should adhere to the FAIR principles; and
How researchers can use KBase and ASM’s Microbiology Research Announcements (MRA) to efficiently and effectively make data FAIR.

Vanja Klepac-Ceraj and Ellen Dow standing at the front of the session space with slide deck screen on the right hand side.

Drs. Vanja Klepac-Ceraj and Ellen Dow opening the Lounge & Learn session. (Photo courtesy of Dr. Jonelle Basso)

The FAIR data principles

FAIR – Findable, Accessible, Interoperable, Reusable – is a set of principles put forth almost 10 years ago by Wilkinson et al. (2016). The FAIR data principles recommend features and attributes that can be added to data as useful context for researchers and machines to discover and reuse the data. Many at the Lounge & Learn session were familiar with the FAIR principles, but to some FAIR was still new. However, even with an audience familiar with the FAIR principles, the application of these principles is still an important, and evolving, discussion in the microbiology community.

Infrastructure for FAIR data

Persistent Identifiers (PIDs) help build an infrastructure for FAIR data. Often under-utilized, the architecture of PIDs connects the people, organizations, and products within a project (Fig. 1). Projects can contain multiple research products, including samples, datasets, software, publications, which can be assigned PIDs to link them together and back to the individual researcher(s) and project. These can be described within a Data Management Plan (DMP), that can also have its own PID.

Map of persistent identifiers that demonstrates how organizations and funding agencies can be linked to individuals writing proposals with data management plans that are related to research products. Research products include the relationships between protocols, software, data sets, samples, and publications. Each of these can be designated different identifiers that can be linked and reference one-another.

Figure 1. Using persistent identifiers (PID) to link organizations and researchers to the overall research project and products.

Why use the FAIR principles?

Data is often not shared because of real and perceived challenges, including knowledge barriers, reuse concerns, and disincentives to share data openly (Gomes et al. 2022). The process can be ambiguous, shared data may be misinterpreted, and credit for data reuse isn’t always given. However, the guidelines outlined in the FAIR data principles can ensure data is reusable and citable by the community. By applying the infrastructure in Figure 1, research products can be linked together and referenced within publications to provide the proper context for reuse, resulting in more citations and increased data impact when the community follows these as a whole.

Breaking down acronyms
FAIR – findable, accessible, interoperable, reusable
DMP – data management plan
ROR – research organization registry
ORCiD – open researcher and contributor identifier
RAiD – research activity identifier
DOI – digital object identifier
IGSN – international geo/general sample number

KBase tracks data provenance and integrates external resources to ensure research products are FAIR, visible, and trackable. Anyone can request a DOI for their static Narrative to cite within publications and track dataset metrics to understand how the KBase community uses published data.

Making Data FAIR with ASM MRA

But infrastructure only goes so far to make data reusable. In addition to PIDs, domain-specific metadata that describes the context around data collection and processing (i.e., sample environment or laboratory conditions) make data comparable and increase the potential of reuse (Wood-Charlson et al. 2022). ASM’s Microbiology Resource Announcements (MRA) promotes best practices of data sharing and reuse beyond FAIR principles. Data shared through MRA meets the requirements for reproducibility and reuse, more than solely uploading a dataset to a repository (though that is also required for MRA!). MRA data publications are peer-reviewed, completely online and open access, and utilize a straightforward submission process with easy to follow checklists.

Looking for support?

Curious about how KBase is working to measure the impact of your data analysis efforts, beyond publications? Contact us at engage@kbase.us.

References & Links

Gomes DGE, Pottier P, Crystal-Ornelas R, Hudgins EJ, Foroughirad V, Sánchez-Reyes LL, Turba R, Martinez PA, Moreau D, Bertram MG, Smout CA. and Gaynor KM. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B (2022) 28920221113. http://doi.org/10.1098/rspb.2022.1113

Wood-Charlson EM, Crockett Z, Erdmann C, Arkin AP, Robinson CB. simple rules for getting and giving credit for data. PLOS Computational Biology (2022) 18(9): e1010476. https://doi.org/10.1371/journal.pcbi.1010476

Resources

MRA author checklist: https://journals.asm.org/pb-assets/pdf-text-excel-files/MRA-Author-Checklist-1673638757867.pdf

KBase MRA Template, Isolates: https://narrative.kbase.us/narrative/122280

KBase MRA Template, MAGs: https://narrative.kbase.us/narrative/126611

Promoting Microbiome Workforce Development with KBase Collection: https://journals.asm.org/journal/mra/kbase

About the Authors

Ellen Dow

Lawrence Berkeley National Laboratory