We are delighted to announce the 16th release of WormBase ParaSite. Highlights of this release include:
- Addition of six new genome assemblies
- Annotation updates for 12 genomes
- Addition of phenotype data for C. elegans genes, imported from WormBase
- Addition of gene name synonyms for a set of Strongyloides stercoralis genes
- Introduction of an archiving service
- Deprecation of CEGMA, and introduction of BUSCO as an annotation quality metric
- New repeat feature libraries for all genomes, generated with RepeatModeler2.
New genomes
This release sees the addition of 6 new assemblies, of which 2 are new species:
- Atriophallophorus winterbourni (new species) – a digenean trematode parasite native to the lakes of New Zealand (from Zajac et al., 2021).
- Bursaphelenchus okinawaensis (new species) – a lab model for the root knot nematode Bursaphelenchus xylophilus (from Sun et al., 2020).
- Bursaphelenchus xylophilus – a new chromosome-scale assembly from Dayi et al., 2020.
- Caenorhabditis remanei – a new chromosome-scale assembly from Teterina et al., 2020.
- Clonorchis sinensis – a new chromosome-scale assembly from Young et al., 2021.
- Oscheius tipulae – a new chromosome-scale assembly from Gonzalez de la Rosa et al., 2021.
Annotation updates
We present full annotation updates for a set of genomes:
- Pristionchus arcanus
- Pristionchus entomophagus
- Pristionchus exspectatus
- Pristionchus fissidentatus
- Pristionchus japonicus
- Pristionchus maxplancki
- Pristionchus mayeri
- Parapristionchus giblindavisi
- Micoletzkya japonica
Three further genomes have seen manual gene model curation by the community:
- Dirofilaria immitis
- Haemonchus contortus (PRJEB506)
- Strongyloides stercoralis
For these species, gene models that have been manually verified/curated can be identified with a note in the “Annotation Method” line on their gene page, for example:
For all 12 species with annotation updates, previous gene models can be visualised in JBrowse in the “WBPS15 gene models track”:
Deprecated gene models and assemblies (and associated analyses) also remain accessible on our new archive site. See this blog post for further information on how to access archived data.
Phenotype data
This release sees the import of over 350,000 C. elegans gene-phenotype associations from our sister site, WormBase. These associations have been curated from the literature over many years, from RNAi and variant data. The data is accessible on C. elegans gene pages:
As the majority of helminth genes don’t currently have data from direct phenotypic assays, phenotypes have also been propagated between orthologues. In the example below, we can see the phenotypes associated with the C. elegans orthologues of a H. polygyrus gene:
In total, we now host phenotype data from 13,350 studies (of which 13,349 C. elegans and 1 Schistosoma mansoni).
Gene name synonyms
We have imported 375 gene name synonyms for Strongyloides stercoralis. These gene names have been curated from the literature by Jonathan Stoltzfus of Millersville University; we hope that storing these synonyms will improve discoverability and connectivity to the literature. The synonyms are searchable, and where present appear in the new “Synonyms” line of the gene page:
We can now see papers referring to this gene in the literature tab:
Many thanks to Jonathan Stoltzfus for providing this data!
deprecation of cegma and BUSCO update
In recognition of the fact that CEGMA has not been supported for several years, we have deprecated CEGMA scores for WormBase ParaSite assemblies. Instead, we now report two BUSCO metrics: one based on the genome assembly and another on the annotation. See the BUSCO manual for further information on these different modes and on interpreting BUSCO scores.
Repeat feature update
We have run RepeatModeler2 to generate custom transposable element (TE) sequence models for all assemblies. The TE sequence files are available to download from our FTP site (repeat-families.fa.gz files):
These custom TE models have also been used for masking, and are available as tracks on the genome browsers. Many thanks to Sanger Institute apprentice Charles Nunn for generating this data.
misc
- Updated protein domain annotation (InterProScan v. 5.51-85.0), functional annotation and cross referencing for all genomes.
- Gene identifiers have been mapped between WormBase ParaSite versions for all Pristionchus sp. updates using Ensembl’s mapping method. Where mapped, previous gene IDs are available in GFF downloads and on gene pages.
- Recomputed orthologues, paralogues and protein families (Ensembl Compara v. 101). Where multiple assemblies exist for the same species, we have only used one assembly for orthology calculations. This means that some assemblies now do not have orthologue and parologue data.