Plant Genomics
Ware Lab – Plant Genomics in KBase
The Ware lab at Cold Spring Harbor Laboratory (CSHL) has two primary goals: (1) understanding plant genome function of agriculturally important crop and model plant systems; and (2) development of cyberinfrastructure including standards, tools, and data sources for the genomics research community.
During 2011-2021, the team primarily led the eukaryotic centric design and development of scientific apps, data models and workflows in the open-science platform of KBase. The team contributed to multiple stages of the product development cycle in the areas of next generation reads processing, functional genomics, genome assembly & annotation, sequence analysis, and metabolic modeling. In addition to the design and development activities, the team handled project management, testing and outreach responsibilities at different stages of the project. The team also presented the KBase research work every year on several meetings and conferences including Plants and Animals Genome (PAG) Conferences; American Society of Plant Biology (ASPB); JGI user meeting and workshops; The DOE Genomic Sciences Program Annual PI Meeting; CSHL Biological Data Sciences; CSHL The Biology of Genomes etc.
Contributions to Functionality and Tools
- Import Reads
- Import SRA File as Reads from Web: Import an SRA file from a web URL into your Narrative as a Reads data object.
- Import Single-End Reads from Web: Import a Single-End Library into your Narrative as a Reads object.
- Import Paired-End Reads from Web: Import a Paired-End Library into your Narrative as a Reads object.
- RNA-seq SampleSet
- Create RNA-seq SampleSet: Provides a way to link RNA-seq reads with corresponding metadata. (Released)
- Create RNA-seq SampleSet with Condition Set: Provides a way to link RNA-seq reads for multiple samples with metadata for each set of samples. (Released)
- RNA-seq Reads Processing
- Assess Read Quality with FastQC: A quality control application for high throughput sequence data. (Released)
- RNA-seq Reads Alignment
- Align Reads using Bowtie2: Aligns sequencing reads to long reference prokaryotic genome sequences using Bowtie2. (Released)
- Align Reads using HISAT2: Aligns sequencing reads to long reference sequences using HISAT2. (Released)
- Align Reads using Tophat2: Aligns the sequencing reads for a set of two or more samples to long reference genome sequences of prokaryotic and eukaryotic genomes using TopHat2. (Released)
- Align Reads using STAR: Aligns the sequencing reads to long reference sequences of prokaryotic and eukaryotic genomes using STAR. (in Beta)
- Assess Read Quality Alignment using Qualimap: Display BAM quality control information for a ReadsAlignment of ReadsAlignmentSet using QualiMap. (Released)
- RNA-seq Reads Assembly
- Assemble Transcripts using StringTie: Assembles transcripts from RNA-seq read alignments using StringTie. (Released)
- Assemble Transcripts using Cufflinks: Assembles transcripts from RNA-seq read alignments using Cufflinks. (Released)
- RNA-seq Differential Gene Expression
- Create Differential Expression Matrix using DESeq2: Identify differential expression in the gene and transcript expression level using DESeq2. (Released)
- Create Differential Expression Matrix using Cuffdiff: Identify differential expression in the gene and transcript expression level using Cuffdiff. (Released)
- Create Differential Expression Matrix using Ballgown: Identify differential expression in the gene and transcript expression level using Ballgown. (Released)
- Create Up/Down Regulated FeatureSet and ExpressionMatrix: Create up/down regulated FeatureSet and ExpressionMatrix from differential expression data based on given statistical threshold cutoffs. (Released)RNA-seq Differential Gene Expression
- Filter Expression Matrix: Filter an expression matrix using either Log Odds Ratio (LOR) or ANalysis of VAriance (ANOVA) algorithms. (Released)
- Expression Data Clustering Tools
- Cluster Expression Data – WGCNA: Perform weighted gene co-expression network analysis (WGCNA) to detect gene clusters and expression patterns. (Released)
- Cluster Expression Data – Hierarchical: Perform hierarchical clustering to group gene expression data into a dendrogram. (Released)
- Cluster Expression Data – K-means: Perform K-means clustering to group expression data for observing and analyzing patterns of gene expression. (Released)
- Expression Visualization Tools
- View Multi-cluster Heatmap: Explore an expression matrix as a multi-cluster heatmap of gene expression levels. (Released)
- View Interactive Heatmap: To view and explore an ExpressionMatrix as a heatmap. (Released)
- View Pairwise Correlation for Expression Data: Explore pairwise correlation values of selected features as a heatmap. (Released)
- Functional Enrichment
- Functional Enrichment for GO terms: Compute gene ontology (GO) term enrichment for genomic features. (Released)
- Samples and Ontology
- Implemented ontology API for retrieving ontology related data from relation engine
- Implemented relation engine service queries on GO/ENVO for ontology API.
- Improved Sample Service by adding ontology validation, caching of user lookup, and updating timestamp used to be consistent with other services.
- Improved ontology data loading by adding script for auto-generating yaml config files and creating new loading protocol.
- Epigenetics
- Implemented kb_Bismark apps for mapping bisulfite treated sequencing reads and performing methylation calls
- Implemented kb_Bismark apps for mapping bisulfite treated sequencing reads and performing methylation calls
Additional Information
Meet the CSHL Members
Principal Investigator:

Doreen Ware
Developer Team:

Sunita Kumari

Vivek Kumar

Jerry Lu
Resources
Publication
Kumari S, Kumar V, Beilsmith K, Seaver SMD, Canon S, Dehal P, Gu T, Joachimiak M, Lerma-Ortiz C, Liu F, Lu Z, Pearson E, Ranjan P, Riel W, Henry CS, Arkin AP, Ware D (2021) A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in Sorghum. Current Plant Biology 28:100229. doi: 10.1016/j.cpb.2021.100229
Tutorials
- Arabidopsis RNA-seq Analysis Tutorial
- E. coli RNA-seq Analysis Tutorial
- RNA-seq Workflow Documentation
Webinar
- Functional Genomics in KBase Webinar – 8 April 2020 – Presented by Sunita Kumari and Vivek Kumar
Workflow
- A KBase Case Study on Genome-wide Transcriptomics and Plant Primary Metabolism in Sorghum [Static Narrative]