On identifiers from external databases and past releases

In release 11 we have updated a number of genomes, among them Schistosoma mansoni. The authors of the new S. mansoni annotation preserved identifiers where genes were unchanged, and otherwise assigned new identifiers. They kept a list of changes – updates, splits and merges – between the two versions, available here in full.

We have now imported this list into WormBase ParaSite. So, some genes will contain a note about the old identifier under “external references”, and you can also search by an old identifier and retrieve the new one.

Annotation authors do not always try to preserve gene identifiers, and e.g. in our other recent updates of H. contortus and F. hepatica, the identifiers are all new. We would like to describe a number of strategies that can be used to reach desired content in such cases, and go through a worked example.

Our search facility is best at suggesting and retrieving our gene identifiers e.g. WBGene00001135 and gene families e.g. hedgehog but will also retrieve a number of different identifiers from e.g. UniProt or GenBank. Do try it first, and let us know if the search is not returning something you would expect it to bring back.

FTP dumps from previous releases

Our Downloads section contains a link to our FTP site, with data for all previous releases of WormBase ParaSite. If all you require is e.g. a protein sequence for an old identifier, you are done – otherwise, you can use the sequence to search further.

BLAST

Given any sequence, you can retrieve a corresponding identifier by BLAST – perfect or near-perfect alignment across a large substring is almost always the right hit. Of course you might also find that your gene of interest has multiple copies, has been split in two in the new annotation, etc.

Other websites 🙂

Going through the UniProt search frequently gives good results when searching for gene families, and you might want to try the BLAST service at ENA or UniProt as an alternative to our BLAST.
We aim for you to able to retrieve all the data you need from WormBase ParaSite, but using multiple resources is very pragmatic and frequently optimal, and thus with every release, we reconcile our identifiers with a number of external sources. This process lets our gene pages link to corresponding pages in another resources in the External references section. To find a relevant WormBase ParaSite page given e.g. an INSDC or UniProt ID, you can use the previously described search facility.

Exonerate

Exonerate is a very fast tool for sequence alignment. It could come in handy if you need to achieve similar results to our online BLAST for a few hundred anonymous sequences. We hope you will not have to resort to it!

Example

Chalmers et al (2015) have identified ten tegumental surface proteins in S.mansoni as important in understanding the host immune response to adult schistosomes. Two of the ten genes – Smp_081920 and Smp_158960 – are absent from the new annotation.

Smp_158960 turns up in search results under a different identifier – it is now Smp_345020. Looking in the External references table reveals that Smp_158960 is the only previous identifier for Smp_345020, and checking view all locations link reveals that Smp_345020 is the only gene annotated with Smp_158960 as a previous identifier, so the mapping is one to one. We can guess that authors modified the gene structure and decided to give the gene a new identifier.

Searching Smp_081920 doesn’t return anything in search, and it took me a bit of time to figure out what happened.
I knew I could get the sequence from the FTP site, but it saved me some effort to discover that the identifier is still hanging on in UniProt, which gave me the protein sequence.
I have then used BLASTP, which gave me a full match with Smp_166350 on about 100bp: long enough for me to potentially trust but perhaps omitted by authors reconciling old and new identifiers.
I got certain I am looking at the same gene after comparing the exon structure of Smp_166350 with the structure of Smp_081920, conveniently still online at our friends’ place, Ensembl Metazoa. The two are very similar, except Smp_166350 has three additional non-coding exons.

I was skeptical of these non-coding exons and wanted to see some expression evidence for them.
I opened the JBrowse display in ParaSite and enabled some tracks across a few studies and developmental stages, and my skepticism grew. I saw high peaks in three non-coding exons and no reads aligned to what is claimed to be the coding region in tracks for cercariae and juveniles. Then, in tracks for adult worms but also in a track for miracidia, I saw the opposite: reads in the coding region, and no expression for that UTR.

My conclusion from this data would be that the gene Smp_081920 is now Smp_166350, but the structure of Smp_166350 is not completely correct: the UTRs are spurious, and they probably make another gene instead. I have submitted this conclusion to the annotation authors – it will be corrected in future versions of the annotation.

 

Advertisements

Genomes we don’t have in WormBase ParaSite

The work on the version 11 of WormBase ParaSite is ongoing! We now have a list of assemblies of new species, and improved genomes of existing species, that we plan to publish. As well as including data submitted to us directly, we also surveyed the archives and have made an effort to include all assemblies of taxa Nematoda and Platyhelminthes  available in NCBI and ENA that have been annotated with gene structures.

We seem to be doing well in our attempts to reflect the sequencing efforts of relatives of Caenorhabdhitis elegans – the order Rhabditida makes up 43 of our 100 Nematoda genomes. This good coverage seems to extend to other nematodes relevant to answering basic biology questions, for nematodes causing disease in humans and livestock, and for parasites of plants.

The situation is very different for our other phylum of interest, Platyhelminthes, which is more diverse, and has historically been harder to study due to complex life cycles of these animals. There isn’t a clear model organism like C. elegans for Nematoda for parasitologists to use to study Platyhelminthes, but many species, like the flukes F. hepatica and S. mansoni, are subject to active research.

In particular, phylum Platyhelminthes has a class Turbellaria of free-living marine worms, full of ancient, evolutionarily unique and remarkable organisms, like this Pseudobiceros bedfordi : a large colorful flatworm that feeds on ascidians and small crustaceans.

 

Some species of this class are studied due to their unusual properties. We are tracking research in the area, and currently have two genomes of Turbellaria – Macrostomum lignano and Schmidtea mediterranea.

M.lignano is a tiny worm living in shore sands of Adriatic sea, and is used as a model organism for a number of evolutionary studies. We’ll be publishing an update to this genome, sequenced in 2017 by University Medical Center Groningen.

S. mediterranea is a freshwater species, important for research in stem cells and regeneration. ParaSite currently hosts the original published version of the Schmidtea genome, from 2014. Another group has since published  a new and improved assembly. We are hopeful that the authors will annotate this version of the genome with gene structures soon, so that we can include it in ParaSite.

Do reach out to us if you have a potential submission, know of a published genome that could be included in ParaSite, or to tell us about ongoing research about Nematoda or Platyhelminthes that should be on our radar.

Announcing WormBase ParaSite 10

We are pleased to announce the tenth release of WormBase ParaSite. We have included genomes of four new species, bringing the total number of genomes to 138, representing 118 distinct species, and updated the assemblies or annotations of an additional four. This includes results of recent efforts to sequence Ascaridae genomes (Wang et al, 2017): a much improved assembly of Ascaris suum, with an N50 of 4.6Mb up from 290.6kb, and a newly sequenced Parascaris univalens. The other three new genomes are of particular interest to the study of nematode reproduction. These are two species of the genus Diploscapter, D. coronatus (Hiraki et al, 2017) and D. pachys (Fradin et al, 2017), which have managed to stay adaptable and maintain genetic variation throughout their long evolutionary history as parasites; and the free-living Caenorhabditis nigoni (Yin et al, 2018), a parthenogenic cousin of hermaphroditic C. briggsae.

The assembly and annotation of Echinococcus canadensis are now in line with the paper introducing the genome (Maldonado et al, 2017). We also have included a tidy-up update of the assembly of Schistosoma haematobium. For the WormBase core species, we have updated the annotations to WS263. As with each release, we have updated our comparative genomics data – recalculating the orthologues and paralogues for all species – and the protein features pipeline, annotating genes with protein domains and inferred GO terms using the latest version InterProScan.

What would you like to learn about in our BSP workshop?

We’re looking forward to presenting a workshop at the upcoming BSP Spring meeting in Aberystwyth. To help us prepare, we’d love to know what our users would find most useful for us to cover. Please fill in our quick survey (one question only!): https://www.surveymonkey.co.uk/r/8LYF2YW

 

Announcing WormBase ParaSite Release 9

We are pleased to announce the ninth release of WormBase ParaSite.  In this release, we have included genomes for two new species (Ditylenchus destructor – PRJNA312427 and Taenia saginata – PRJNA71493) and an additional alternative genome for one species (Taenia asiatica – PRJNA299871).  This brings the total number of genomes to 134, representing 114 species.  Additionally, all WormBase core species have been updated to release WS258.

As part of this release, we have introduced a new genome browser: JBrowse (example for S. mansoni).  This offers a fast and interactive experience for all genomes.  Custom tracks can be attached directly from a local computer, without needing to upload any data.  This makes JBrowse ideal for visualising large data files, without the need to download or install any additional software.  Other features include combination tracks for visualising multiple files as a single track, and sequence search tracks for locating short sequence motifs within the genome.

Featured paper: The Echinococcus canadensis (G7) Genome

Tapeworms of the Echinococcus granulosus sensu lato species complex cause the zoonotic disease echinococcosis. Maldonado et al, from the Kamenetzky group at the University of Buenos Aires, have recently reported the genome assembly and annotation of one of the members of this species complex, Echinococcus canadensis (G7 genotype). The genome paper additionally describes a number of comparative analyses between E. canadensis (G7), E. canadensis (G1) and E. multilocularis, based on gene orthology, genome-wide SNP analyses and identification of regulatory features.

Version 1 of the E. canadensis (G7) genome is currently available in WormBase ParaSite, with version 2 becoming available in release 10.

Travel bursaries announced for Hydra conference

Up to 3 awards will be made for early-career researchers from Africa, Asia and Latin America to support participation at the meeting; the bursaries will cover registration fees (GBP 495, approx USD 625), and a sum of USD 1000 towards airfare and ground transportation costs. Awardees will be selected on the basis of Abstracts submitted to the conference, and a letter of application outlining the reasons for attending the meeting. Applicants should register through this website and send their application letter to Hydra@ed.ac.uk before the closing date of 31 March 2017.

Conference website: http://hydra.bio.ed.ac.uk/