New Viral Tools in KBase
Have you noticed the “Virus” app category while working in your Narratives? KBase has two new Apps to analyze the viral content of your metagenomic assemblies! Together, VirSorter and vConTACT2 make up the first viral genomics analysis workflow in KBase.
- VirSorter is designed to detect viral signals in different types of microbial sequence data using probabilistic models and extensive virome data to maximize detection of novel viruses.
- vConTACT2 is for classifying viral genomic sequence data, designed to cluster and provide taxonomic context of viral metagenomic sequencing data.
You can see a demonstration of this new pipeline in the Viral Annotation Pipeline static Narrative.
The viral analysis pipeline starts with a metagenome assembly. If you already have an assembly, then you can identify viruses using VirSorter and classify them with vConTACT2. With VirSorter, you can identify what sequences are potentially viral. Generally speaking, default parameters are fine. The “Reference database” parameter allows you to select which database is used for detecting the viral sequences. By default it is set to RefSeq DB, but if you trust viral data from virome datasets, you may want to choose the use Virome DB database.
After calling and sorting the sequences that are potentially viral, we can annotate them with vConTACT2 to better understand the taxonomic context of the viral content. vConTACT2 will extract each viral genome and its associated gene predictions, and build a “Gene2Genome” table that underpins the whole analysis. There are a lot of options for vConTACT2. But in KBase, all the default options have been selected. There’s no need to change anything – except if you want to use the most recent version of NCBI’s Viral RefSeq. You may prefer to use the “older” version because it is often referenced in publications. But, if you want to use the most recent version, the option is there, and you can compare the differences.
After running vConTACT2, you’ll get a table with all of the viral sequences (marked as “genomes” in the table) potentially identified, clustering hits, outliers, and confidence values. Clustering of these viral “genomes” in the table allows you to assess their taxonomic classification. Different viral sequences that co-cluster with the same reference genome are likely taxonomically similar to the reference at some taxonomic level. Outliers are sequences that are “connected” to clusters in the network, but don’t have sufficient confidence values to place them within that cluster.
With this new pipeline, you can examine the potential viral content of your metagenomic assemblies quickly and easily. When combined with KBase’s Microbial Communities analysis tools, users can gain significant insight into which microorganisms and viruses are present within a sample, and their functional relationships are to one another.
VirSorter and vConTACT2 were added to KBase by Drs. Matt Sullivan and Ben Bolduc of The Ohio State University working with the KBase team. This effort was supported with funding the US Department of Energy through the Soil Microbiome Science Focus Area.