Announcing WormBase ParaSite 11

We are pleased to announce the 11th release of WormBase ParaSite. We have added data, new functionality, and improved some of our per-release processes. Quite a few genomes are new or have been significantly updated.

Comments in ParaSite

By popular request we have added a comment-like space to our gene pages. You can mention your own results, point out an inconsistency, or make an observation about displayed data. We hope the comments will be of scientific content, even when taking a lighter form than communication through peer-reviewed journals.

New species and updates

We are updating three flatworm genomes:

The new assembly of S. mansoni is very complete and accurate, and the gene models were manually annotated over the course of several months.

This is the list of our new or updated nematode genomes:

Parasites:

Free-living:

We are also publishing a fix from the authors to their annotation of Ascaris suum and Parascaris univalens. The gene models submitted to us in the previous release suffered from a systematic error, which resulted in much shorter proteins. We regret the error.

Additionally, the release includes genomes from the WS265 release of WormBase, including the newest WormBase core species, Trichuris muris.

In total 17 genomes were added or changed, bringing the total to 148 genomes across 124 species.

Analyses

We ran all our usual tertiary annotation pipelines that identify repeats, low complexity regions, non-coding RNAs, protein domains, predicted GO terms, and more, as well as our comparative genomics pipeline, where we expect an improvement especially around recently updated branches.

We have revisited our cross-references pipeline. This pipeline lets us support discovery of what is currently known about each gene, and provide rich descriptions, through inferring links between our genes and entities from other resources. We have updated these cross-references for all our genomes, and added new references: to UniParc and RNAcentral. You can either find these references on the gene pages, or use BioMart to retrieve them in bulk.

RNASeq data

We have configured our JBrowse track displays to include tracks with aligned reads produced by the RNASeq-er project. RNASeq-er processes all RNASeq datasets published in ENA, so there are many tracks to choose from: currently 11735, from 492 studies across 74 species.

Use case 1 : discover public datasets

Browsing our displays could be an alternative to searching a primary source like the European Nucleotide Archive, benefiting from an additional filter: they include  only the runs that RNASeq-er successfully aligned and passed through QC.

For more than half the species, the only RNASeq data available is what was produced while preparing the genome.

For other species, there have been additional studies, for example see the JBrowse display of tapeworm Echinococcus multilocularis. Apart from a few studies from the same lab in the United Kingdom where the genome was sequenced, the display contains runs sequenced in China and Japan as part of BioProjects PRJNA254535 and PRJDB3524.

The WormBase ParaSite species with most RNASeq data present is unsurprisingly C. elegans, with 6446 runs across 245 studies.

Use case 2 : compare expression across tracks

Consider the gene Smp_169190 in Schistosoma mansoni. Lu et al (2018, preprint) compared expression in developmental stages, and found Smp_169190 to be differentially expressed in the cercarial life stage.

This link here  takes you to two of the tracks used in Lu et al’s metaanalysis. These are ERR022872, showing RNA sequenced from cercaria, and a track ERR506086, with RNA sequenced from an an adult worm. The two tracks differ in expression dramatically in gene Smp_169190.

You can also see the expression in other life stages. A search for “cercaria” shows quite a few tracks, that will probably be similar to ERR022872. Similar search for “miracidia” yields a track SRR922067 and an interesting result: miracidia don’t express Smp_169190, but there is high expression for two of the nearby TAL genes.

Advertisements

Genomes we don’t have in WormBase ParaSite

The work on the version 11 of WormBase ParaSite is ongoing! We now have a list of assemblies of new species, and improved genomes of existing species, that we plan to publish. As well as including data submitted to us directly, we also surveyed the archives and have made an effort to include all assemblies of taxa Nematoda and Platyhelminthes  available in NCBI and ENA that have been annotated with gene structures.

We seem to be doing well in our attempts to reflect the sequencing efforts of relatives of Caenorhabdhitis elegans – the order Rhabditida makes up 43 of our 100 Nematoda genomes. This good coverage seems to extend to other nematodes relevant to answering basic biology questions, for nematodes causing disease in humans and livestock, and for parasites of plants.

The situation is very different for our other phylum of interest, Platyhelminthes, which is more diverse, and has historically been harder for biologists to study due to complex life cycles of these animals. The phylum doesn’t have a clear model organism like C. elegans, but many species, like the flukes F. hepatica and S. mansoni, are subject to active research.

In particular, phylum Platyhelminthes has a class Turbellaria of free-living marine worms, full of ancient, evolutionarily unique and remarkable organisms, like this Pseudobiceros bedfordi : a large colorful flatworm that feeds on ascidians and small crustaceans.

 

Some species of this class are studied due to their unusual properties. We are tracking research in the area, and currently have two genomes of TurbellariaMacrostomum lignano and Schmidtea mediterranea.

M.lignano is a tiny worm living in shore sands of Adriatic sea, and is used as a model organism for a number of evolutionary studies. We’ll be publishing an update to this genome, sequenced in 2017 by University Medical Center Groningen.

S. mediterranea is a freshwater species, important for research in stem cells and regeneration. ParaSite currently hosts the original published version of the Schmidtea genome, from 2014. Another group has since published  a new and improved assembly. We are hopeful that the authors will annotate this version of the genome with gene structures soon, so that we can include it in ParaSite.

Do reach out to us if you have a potential submission, know of a published genome that could be included in ParaSite, or to tell us about ongoing research about Nematoda or Platyhelminthes that should be on our radar.

Announcing WormBase ParaSite 10

We are pleased to announce the tenth release of WormBase ParaSite. We have included genomes of four new species, bringing the total number of genomes to 138, representing 118 distinct species, and updated the assemblies or annotations of an additional four. This includes results of recent efforts to sequence Ascaridae genomes (Wang et al, 2017): a much improved assembly of Ascaris suum, with an N50 of 4.6Mb up from 290.6kb, and a newly sequenced Parascaris univalens. The other three new genomes are of particular interest to the study of nematode reproduction. These are two species of the genus Diploscapter, D. coronatus (Hiraki et al, 2017) and D. pachys (Fradin et al, 2017), which have managed to stay adaptable and maintain genetic variation throughout their long evolutionary history as parasites; and the free-living Caenorhabditis nigoni (Yin et al, 2018), a parthenogenic cousin of hermaphroditic C. briggsae.

The assembly and annotation of Echinococcus canadensis are now in line with the paper introducing the genome (Maldonado et al, 2017). We also have included a tidy-up update of the assembly of Schistosoma haematobium. For the WormBase core species, we have updated the annotations to WS263. As with each release, we have updated our comparative genomics data – recalculating the orthologues and paralogues for all species – and the protein features pipeline, annotating genes with protein domains and inferred GO terms using the latest version InterProScan.