ASPB Plant Biology Worldwide Summit 2021 KBase Workshop

A KBase Case Study: Investigating the reconfiguration of plant metabolism in response to pathogen invasion using SRA data.

ASPB Plant Biology Worldwide Summit 2021 Workshop

Sunday, July 18th 2021

Organizers: Sam Seaver and Kat Beilsmith

Additional Speakers: Sunita Kumari and Vivek Kumar

MENU

Resources for the ASPB 2021 workshop
Additional examples and analysis in KBase
Post-workshop poll and sign up for the follow-up session
Brief introduction to KBase
Brief introduction to PlantSEED
Abstract for the ASPB 2021 workshop

Resources for the ASPB2021 workshop

Sign up for a KBase account
Custom data upload guide
Narratives for the workshop
- ASPB2021 Basic Tutorial Narrative: https://narrative.kbase.us/narrative/90945
- ASPB2021 Custom Tutorial Narrative: https://narrative.kbase.us/narrative/90946
Workshop FAQ

Additional examples and analysis in KBase

For additional examples, see the Narratives Expression Matrices for Integration with Metabolic Models and Integrating Abundance Data with Metabolic Models, which were presented with our Poster for ASPB 2020. They are based on tutorials for the PlantSEED Resource in KBase and the Arabidopsis RNA-seq Analysis Tutorial.

Post-workshop poll and sign up for the follow-up session

We appreciate your feedback on this workshop. Please help us by filling out the poll linked below (11 multiple choice questions and a few optional free response fields).
We invite attendees of our ASPB2021 Workshop to a special KBase follow-up session. In this session, KBase experts will be available to discuss any obstacles or questions that have come up while working with the pipelines we introduce in the workshop. You can indicate your interest in attending the follow-up session and provide an email address where we can reach you in the poll.

https://forms.gle/tmBJyNLEobGsk8QB6

Brief introduction to KBase

The Department of Energy Systems Biology Knowledgebase (KBase) is a knowledge creation and discovery environment designed for both biologists and bioinformaticians. KBase integrates a large variety of public data and analysis tools into an easy-to-use graphical user interface and leverages DOE computational infrastructure to perform sophisticated systems biology analyses. KBase is a freely available system that enables scientists to upload their own data, analyze it alongside collaborator and public data, build validated systems biology models, and share workflows and conclusions.

The KBase Narrative Interface is built on top of the Jupyter Notebook platform. When a user begins a new computational experiment, they create a new Narrative and populate it with three types of cells:

Apps: App cells allow the user to operate the software functions made available by KBase and to chain them by connecting their inputs and outputs. Third-party developers can add their own open-source, open- license Apps to the system using the Software Development Kit.
Markdown: Markdown cells allow users to insert text and figures to “narrate” their computational experiment, explaining the data, the process, and the results.
Code: Code cells allow users to create custom code for handling the results and generating new visualizations and tables in a high-throughput manner.

The Narrative Interface allows the user to upload their private data, share their data selectively or publicly, search and retrieve extensive public reference data, and access data shared with them by others.

A finished Narrative is persistent and represents a complete record of the methods for a computational experiment. Public Narratives serve as resources for the user community by capturing valuable data sets, associated computational analyses, and scientific context describing the rationale behind a scientific study. The Narrative structure simplifies the re-purposing, reapplication, and extension of scientific techniques, thereby supporting reproducible and transparent research in KBase.

Brief introduction to PlantSEED

The PlantSEED project curates Arabidopsis enzymes involved in plant primary metabolism, the distinct mass-balanced biochemical reactions that they catalyze, and their localization within plant cells. We can reconstruct the network of plant primary metabolism comprising over 1,000 of these curated enzymes and biochemical reactions spanning several organelles, including the plastid and mitochondrion. The primary metabolic network can be reconstructed for any species of flowering plants with a sequenced genome.

PlantSEED streamlines the process of annotating newly sequenced plant genomes with enzymatic functions, constructing metabolic models based on these annotations, and simulating plant primary metabolism with these models. In KBase, the data and Apps that constitute the PlantSEED resource are designed to expand in an iterative manner by including new plant genome sequences, new annotations harvested from the literature, and improved biochemical data.

Abstract for the ASPB2021 workshop

In this workshop, we will demonstrate the use of Narratives and Apps in The Department of Energy Systems Biology Knowledgebase (KBase) to create reproducible, flexible workflows for the analysis of next generation sequencing (NGS) data in both model plants and crops. We will present a case study using SRA data and also lead a hands-on session in which users can create their own Narratives, import data, run Apps, and analyze the results.

Case Study: Plant metabolism is reconfigured in response to invasion by pathogens. When mediated by transcriptional changes, this metabolic reconfiguration can be detected by comparing genome-wide mRNA sequence (mRNA-seq) data between infected and control plants. By integrating transcript abundances with a model of primary metabolism, we can then identify biochemical reactions where transcription is altered upon infection in both model and crop species. This approach can be extended to study the responses of plant metabolism to a variety of environmental conditions.

Presentation: The KBase tools featured in our case study are broadly applicable for profiling plant gene expression using NGS data. We will demonstrate the following workflow: In a KBase Narrative, we import public mRNA-seq data from susceptible Arabidopsis thaliana and Solanum lycopersicum (tomato) samples treated with either the bacterial pathogen Pseudomonas syringae DC3000 or sterile buffer as a control. Using KBase Apps, we process the Illumina reads to obtain consistent length, depth, and quality across datasets. We align the reads to reference genomes and assemble transcripts using a customized combination of KBase Apps running published, open-source software. We then generate tables with normalized, average transcript abundances for infection and control conditions.

Our case study also features user-friendly tools for building metabolic models from plant genomes in KBase. Transcript abundances are integrated with genome-scale networks of primary metabolism to find the reactions with the largest expression changes between conditions.

Hands-on session: In the second part of the workshop, users will receive guidance on importing their own NGS data to a KBase Narrative and selecting from among the Apps described above to analyze gene expression or construct metabolic models.