Announcing WormBase ParaSite Release 16

We are delighted to announce the 16th release of WormBase ParaSite. Highlights of this release include:

  • Addition of six new genome assemblies
  • Annotation updates for 12 genomes
  • Addition of phenotype data for C. elegans genes, imported from WormBase
  • Addition of gene name synonyms for a set of Strongyloides stercoralis genes
  • Introduction of an archiving service
  • Deprecation of CEGMA, and introduction of BUSCO as an annotation quality metric
  • New repeat feature libraries for all genomes, generated with RepeatModeler2.
New genomes

This release sees the addition of 6 new assemblies, of which 2 are new species:

  • Atriophallophorus winterbourni (new species) – a digenean trematode parasite native to the lakes of New Zealand (from Zajac et al., 2021).
  • Bursaphelenchus okinawaensis (new species) – a lab model for the root knot nematode Bursaphelenchus xylophilus (from Sun et al., 2020).
  • Bursaphelenchus xylophilus – a new chromosome-scale assembly from Dayi et al., 2020.
  • Caenorhabditis remanei – a new chromosome-scale assembly from Teterina et al., 2020.
  • Clonorchis sinensis – a new chromosome-scale assembly from Young et al., 2021.
  • Oscheius tipulae – a new chromosome-scale assembly from Gonzalez de la Rosa et al., 2021.
Annotation updates

We present full annotation updates for a set of genomes:

  • Pristionchus arcanus
  • Pristionchus entomophagus
  • Pristionchus exspectatus
  • Pristionchus fissidentatus
  • Pristionchus japonicus
  • Pristionchus maxplancki
  • Pristionchus mayeri
  • Parapristionchus giblindavisi
  • Micoletzkya japonica

Three further genomes have seen manual gene model curation by the community:

  • Dirofilaria immitis
  • Haemonchus contortus (PRJEB506)
  • Strongyloides stercoralis

For these species, gene models that have been manually verified/curated can be identified with a note in the “Annotation Method” line on their gene page, for example:

For all 12 species with annotation updates, previous gene models can be visualised in JBrowse in the “WBPS15 gene models track”:

Deprecated gene models and assemblies (and associated analyses) also remain accessible on our new archive site. See this blog post for further information on how to access archived data.

Phenotype data

This release sees the import of over 350,000 C. elegans gene-phenotype associations from our sister site, WormBase. These associations have been curated from the literature over many years, from RNAi and variant data. The data is accessible on C. elegans gene pages:

As the majority of helminth genes don’t currently have data from direct phenotypic assays, phenotypes have also been propagated between orthologues. In the example below, we can see the phenotypes associated with the C. elegans orthologues of a H. polygyrus gene:

In total, we now host phenotype data from 13,350 studies (of which 13,349 C. elegans and 1 Schistosoma mansoni).

Gene name synonyms

We have imported 375 gene name synonyms for Strongyloides stercoralis. These gene names have been curated from the literature by Jonathan Stoltzfus of Millersville University; we hope that storing these synonyms will improve discoverability and connectivity to the literature. The synonyms are searchable, and where present appear in the new “Synonyms” line of the gene page:

We can now see papers referring to this gene in the literature tab:

Many thanks to Jonathan Stoltzfus for providing this data!

deprecation of cegma and BUSCO update

In recognition of the fact that CEGMA has not been supported for several years, we have deprecated CEGMA scores for WormBase ParaSite assemblies. Instead, we now report two BUSCO metrics: one based on the genome assembly and another on the annotation. See the BUSCO manual for further information on these different modes and on interpreting BUSCO scores.

Repeat feature update

We have run RepeatModeler2 to generate custom transposable element (TE) sequence models for all assemblies. The TE sequence files are available to download from our FTP site (repeat-families.fa.gz files):

These custom TE models have also been used for masking, and are available as tracks on the genome browsers. Many thanks to Sanger Institute apprentice Charles Nunn for generating this data.

misc
  • Updated protein domain annotation (InterProScan v. 5.51-85.0), functional annotation and cross referencing for all genomes.
  • Gene identifiers have been mapped between WormBase ParaSite versions for all Pristionchus sp. updates using Ensembl’s mapping method. Where mapped, previous gene IDs are available in GFF downloads and on gene pages.
  • Recomputed orthologues, paralogues and protein families (Ensembl Compara v. 101). Where multiple assemblies exist for the same species, we have only used one assembly for orthology calculations. This means that some assemblies now do not have orthologue and parologue data.

Alternative gene set for Necator americanus

Logan et al. (2020) have published an alternative set of gene predictions for Necator americanus in PLoS Neglected Tropical Diseases, based on both RNA-seq and proteomics data and generated via the MAKER pipeline.

Their gene predictions can be downloaded from the WormBase ParaSite FTP site at:
ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/datasets/logan_2020_32453752

Thanks to Javier Sotillo Gallego for providing the data!

Brugia pahangi material available

The Devaney group have small numbers of adult B. pahangi and larger numbers of Mf available to others for research purposes.  If this would be useful, please contact eileen.devaney@glasgow.ac.uk with approximate numbers, life stage required and whether fresh or frozen material is suitable. The B. pahangi life cycle is funded by a grant from the Wellcome Trust (208390/Z/17/Z).

Announcing release 14

We are pleased to announce the 14th release of WormBase ParaSite, bringing a new S. mediterranea assembly, and 8 other new and updated genomes.


New and updated genomes

Platyhelminths

We are happy to announce these new genomes of flatworms:

There is also an annotation update for Mesocestoides corti (PRJEB510) created with recently sequenced RNASeq data. It supports 5076 new genes and 8367 revised structures of the previous AUGUSTUS-only annotation.

If flatworm genomics is relevant to your work, be sure to also visit PlanMine, run by the authors of the Schmidtea genome. It contains many different assemblies from a number of free-living flatworms, phylogenetic data, and more.

Please note that we are deprecating the assembly SmedGD_c1.3 for Schmidtea mediterranea (PRJNA12585), corresponding to Robb et al. (2007), and intend to remove it once we are confident that no new research is being based on this assembly. Do let us know if you rely on it, or if there are good reasons for us to keep both Schmidteas around.

Nematodes

There are two new clade IV genomes, potentially relevant to agricultural research: Ditylenchus dipsaci (PRJNA498219), a plant pest, and an updated genome of an entomopathogenic nematode Steinernema carpocapsae (PRJNA202318).

There are also two genomes of free-living clade V nematodes. First is the genome of Halicephalobus mephisto (PRJNA528747), an extremophile found in deep rock fracture water in several gold mines in South Africa. We also have a genome of Mesorhabditis belari (PRJEB30104), an animal exhibiting an interesting pattern of reproduction: the eggs only mature after being activated by the males, which nevertheless do not pass on any genetic material.

Finally, we update WormBase core genomes to the WormBase version WS271.

Comparative genomics: bringing smaller trees

We remove altogether 7 genomes from our comparative genomics analysis for which there is a clearly better alternative genome of the same species.

We are hoping that this will make our results more robust overall, and their interpretation easier.

If you still need the old results, ortholog and paralog files from the last release are available through our FTP site. Apart from the previous S. mediterranea (PRJNA12585), we do not plan to remove any other genomes from our portal.

RNASeq studies

Our collaborators, the Functional Genomics group at the European Bioinformatics Institute, continue to process all public RNASeq studies through their plaform, RNASeq-er.

More studies

New data that has been produced in the last months, and more inclusive curation, helped us bring the total of studies processed on our site to 201, across 48 different species. This includes 30 studies for S. mediterranea, now aligned to the new assembly.

The total amount of studies RNASeq-er has data for is, as of July 2019, 639. Apart from 201 we have the results for, there are also 301 unannotated C. elegans or P. pacificus studies, which we skipped to reduce toil involved. The other 137 studies miss metadata or were consciously excluded, either because they did not have sufficiently many replicates, used a non-standard protocol like small miRNA-seq or Ribo-Seq, or because the authors asked us to suppress it.

Do contact us if you would like us to include a particular study, or if you have metadata that we are missing. It would be particularly helpful if you could let us know of any additional publications relating to our studies, as they are not always linked to archive records.

UI updates

There are no major changes to this aspect of our service since we rolled it out in the last release: you can browse through a list of studies by following a link on the species page, and access per-gene results on the gene page through the tab on the left.

We have improved how expression data is organized within the JBrowse track selector, separating studies into categories. If you usually access the gene expression studies through the “Gene expression” tab, have a look at how the studies are organized in the track selector (example link, S. mansoni) – it provides an interesting alternative way of viewing the results.

Analysis updates

Differential expression result files should now be slightly more convenient, and we hope that you will be able to open the files without trouble in any popular spreadsheet software. We are also providing complete results for each contrast – useful if you want to apply your own filtering criteria.

There is currently a slight non-uniformity in our count and TPM results, as RNASeq-er are switching between two different quantification methods. Either HTSeq (previous) or FeatureCounts (new) are used to quantify aligned reads within each study.

Helminth Bioinformatics Course in Accra, Ghana: Open for Applications

We’re pleased to announce that a new Wellcome Advanced Course in helminth bioinformatics is now open for applications. The course is aimed at Africa-based researchers at various levels. It will be a hands on and practical introduction to bioinformatics for helminth researchers, covering:

  • The use of public databases (including WormBase ParaSite) to explore gene and protein function
  • Genome assembly
  • Variant calling
  • Differential gene expression
  • Unix/linux command-line and some basic R

Dates and Deadlines

The course will be held at the West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Accra, Ghana, from 8 -13th September 2019.

The course is free to attend for non-commercial applicants, and a number of bursaries are available to cover travel, accommodation and sustenance.

The deadline for applications is 9th May.

More details are available here: https://coursesandconferences.wellcomegenomecampus.org/our-events/helminth-bioinformatics-ghana-2019/

 

 

Announcing WormBase ParaSite 12

We are pleased to announce the 12th release of WormBase ParaSite, bringing new and updated genomes, and better handling of old identifiers and history.

New and updated genomes

The biggest updates of the release are probably two tapeworm genomes: Hymenolepis microstoma (PRJEB124), an update to a chromosome-level assembly, and a new genome Taenia multiceps (PRJNA307624).

There are also new clade IV nematode genomes: root-knot nematodes Meloidogyne graminicola (PRJNA411966) and Meloidogyne arenaria (PRJNA438575) (alternative assembly), and a bacteria-feeding Acrobeloides nanus (PRJEB26554).

The rest of the updates are:

Data and tools

We have re-ran our comparative genomics pipeline, constructing gene trees and finding orthologs and homologs. We have also reran newest InterProScan (5.30-69.0) and our cross-references pipeline for all our genomes.

We re-imported all public RNASeq data from our collaborators, and did a round of minor improvements to the displays. We now use information from PubMed to let you find data sets corresponding to a publication of interest.

Archived gene IDs

New sequencing technologies let labs construct better genome assemblies, bringing access to chromosome level assembly data even to relatively small research communities. We are excited to see this trend. Each genome update brings new evidence and potentially unlocks research into previously forbiddingly difficult biological questions.

At the same time, insights gathered in work published using previous assemblies should stay accessible to the community, so there is a need to connect different assemblies with each other. As of this release, WormBase ParaSite will keep track of previous identifier versions at gene level and display annotation history. Authors of a genome update do not always provide a mapping to previous version, so we developed a pipeline to match up identifiers between genome versions.

Overview of new functionality

For an overview of how this now works consider Smp_340760, a Schistosoma mansoni gene. The gene model was revised in the past, twice: it used to be called Smp_044010, but in “Schisto_7.1” version of the annotation that we published in WBPS11 the authors changed gene structure enough that they decided to assign it a new identifier, and in “Schisto_7.2” which we publish now the gene model was corrected slightly.

Searching by Smp_044010 now leads to a page explaining that the identifier was deprecated and redirecting to Smp_340760. Over there, the history is represented by a diagram:

SchistosomamansoniPRJEA36577_Smp_340760

The site also displays previous protein sequences of transcripts, to help you carry forward any conclusions based on the previous gene model – the less the sequence has changed, the more similar results will be for e.g. BLAST matches.

ID mapping pipeline

We used authors’ mappings between annotation versions for updates of Schistosoma mansoni and Hymenolepis microstoma. Everywhere else we used an automated mapping pipeline, adapted from Ensembl gene build.

Pipeline description

The pipeline runs a sequence matching tool exonerate, scoring matches of exons between the two assemblies and propagating the scores onto the
transcript and gene level. The scores are then adjusted based on synteny – if a gene A is near gene B in the previous genome, A is mapped to A’ in the new genome, and there is a gene B’ near A’, the match of B to B’ is strengthened. Finally, best matches are iteratively taken out of the scoring, producing a list of pairs.

Results and benchmarking

We find the pipeline to be quite conservative even after we relaxed a few parameters around minimal match scores and similar values.  Typically only between a third and two thirds of the genes in the updated genome have a related past identifier:

Genome previous genes total previous version mapped new genes total fraction mapped
Ancylostoma ceylanicum PRJNA72583 11783 WBPS11 7564 15892 0.476
Ascaris suum PRJNA62057 17974 WBPS9 9468 15260 0.620
Fasciola hepatica PRJEB25283 16806 WBPS10 7564 22676 0.334
Haemonchus contortus PRJEB506 19430 WBPS10 11439 21869 0.523
Meloidogyne incognita PRJEB8714 45351 WBPS10 11977 19212 0.623

We also ran the automated pipeline on the S. mansoni WBPS10->WBPS11 update, comparing the results to a manual mapping obtained by annotators tracking individual identifiers. Our pipeline carried forward 5165 genes that authors considered to have none or minor changes, and 1347 genes with larger changes, onto an annotation with 10172 genes. The pipeline missed 2584 genes present somewhere in the manual mapping that were lost in the automatic one. It disagreed with the manual mapping in only 199 cases: some were genuinely wrong calls, and some are some were on par with manual mapping by being e.g. a mapping to a paralog gene.

Website maintenance 4th Dec 2018

Update: The task has not been finished yet and may take a couple of more hours. We expect to complete the task by 10pm UK Time. Thank you for your cooperation.

Please note that we are going to perform a server maintenance for the website on Tuesday 4th Dec 2018 from 2pm to 5pm (UK Time). During this period, you will not be able to sign in and use tools on the website including BLAST and VEP. We are sorry for the inconvenience this may cause.

 

About archive sites

We’re pleased to announce that we have introduced an archiving service.

From release 16 onwards, older WormBase ParaSite releases will remain available for browsing. We have introduced this service to help users in transitions between genome and annotation versions. Older, draft assemblies are increasingly being superseded by highly contiguous assemblies generated with modern sequencing technologies. We want to host this data where it is available, but recognise that direct replacement of older assembly versions can cause disruption to researchers.

In release 16, we have updated gene models for 12 assemblies and replaced one assembly, Clonorchis sinensis (PRJNA386618), with a more highly scaffolded version. All superseded data remains available on the release 15 archive site:

https://release-15.parasite.wormbase.org/index.html

Due to technical restrictions, the site has slightly reduced functionality compared with the live site. BLAST and VEP tools are not available. BioMart does remain available.

The search service is also not available. To navigate to a gene page, paste its ID directly into the URL. For example:

https://release-15.parasite.wormbase.org/Gene/Summary?g=maxplancki-mkr-S2-19.22-mRNA-1

Where “maxplancki-mkr-S2-19.22-mRNA-1” is a gene stable ID. For annotation updates, deprecated gene pages can also be accessed from the JBrowse genome browser on the live site:

In addition to the archive site, data from all previous releases remains available to download from our FTP site in perpetuity.

Announcing WormBase ParaSite Release 15

We’re delighted to announce the release of WormBase ParaSite 15, our biggest release yet!

New genomes

This release sees the addition of assemblies of 19 new species:

Genome and annotation updates

We have also incorporated alternative or updated assemblies/gene models for a further 8 species:

  • An alternative (PacBio) assembly from Sun-Yat Sen University for Angiostrongylus cantonensis (PRJNA350391,  Xu et al, 2019).
  • An alternative Hymenolepis diminuta assembly from The University of Warmia and Mazury (PRJEB30942, Nowak et al., 2019).
  • An alternative (PacBio) Steinernema feltiae assembly (PRJNA353610, Fu et al., 2020).
  • An alternative (PacBio) Schistosoma japonicum assembly from Fudan University (PRJNA520774, Luo et al., 2019).
  • Schistosoma haematobium (PRJNA78265) has been updated to reflect an assembly update in INSDC (GCA_000699445.2, Stroehlein et al., 2019). Gene ID mappings from the previous assembly (GCA_000699445.1) are available in the GFF file, and on gene pages.
  • A handful of curated gene models have been added to the Fasciola hepatica (PRJEB25283) annotation, submitted by Emily Robb of Queens’ University Belfast.
  • Curated gene models for the P-glycoprotein repertoire have been added to the Parascaris univalens (PRJNA386823) annotation, submitted by Alexander Gerhard of The Free University of Berlin.
  • The Haemonchus contortus (PRJEB506) annotation has been updated to reflect ongoing curation by Steve Doyle and colleagues at the Wellcome Sanger Institute.

RNAi data

We have imported phenotype data from the recent Schistosoma mansoni RNAi screen by Wang et al., 2020. You can browse the data from S. mansoni gene pages (see the “phenotype” link in the left hand menu). Please let us know if you have any feedback or suggestions for other RNAi studies that you’d like to see in WormBase ParaSite.

Other changes to note

  • The Ancylostoma ceylanicum (PRJNA72583) annotation has been updated to incorporate some missing gene models.
  • Caenorhabditis sp. 34 has been renamed to Caenorhabditis inopinata.
  • In response to feedback from the community, we now present two alternative Steinernema carpocapsae assemblies: GCA_000757645.1, as described by Dillman et al., 2015 and GCA_000757645.3, as described by  Serra et al. (2019). The Serra annotation has been updated to incorporate UTRs, which were omitted in WormBase ParaSite 14. IDs for genes on the X chromosome have also been updated to follow the naming convention of the rest of the gene set. ID mappings between WBPS14 GCA_000757645.3 and WBPS15 GCA_000757645.3 are available on gene pages and in the GFF file. A mapping between gene IDs in GCA_000757645.1 and GCA_000757645.3 is available on the WBPS FTP site.
  • IsoSeq data for the heartworm Dirofilaria immitis is available on JBrowse, submitted by Nic Wheeler and colleagues of the University of Wisconsin-Madison.
  • We have added the alternative Necator americanus gene set, described by Logan et al., 2020, as a JBrowse track.

Parasitic Helminths: New Perspectives in Biology and Infection

30th August – 4th September, 2020: Bratsera Hotel, Hydra, Greece.

Registration: 14 January – 28 February 2020

http://hydra.bio.ed.ac.uk/

The 14th conference in a series previously titled Molecular and Cellular Biology of Helminth Parasites

About the Meeting

The study of helminth parasites continues to excite great interest across the suite of modern scientific themes. With a wealth of genome information and high-throughput technologies, new drug and vaccine development, and intricate host-parasite molecular interactions, we are witnessing a new era of research on these organisms and the diseases they cause. Parasitic Helminths : New Perspectives in Biology and Infection continues the highly successful series now held every year on the beautiful island of Hydra, Greece. All major helminth research areas are covered, including new genomics of animal- and plant-parasitic nematodes, interfaces with free-living helminths such as C. elegans and planarians, developmental and molecular biology, genetics, neurobiology, pharmacology, immunology and vaccine research, all aimed at creating new strategies for control of these prevalent parasitic organisms and the diseases they cause.

Venue

A key feature of the Hydra venue is the Bratsera Hotel hosting the scientific sessions, and the quiet, traffic-free town with ample facilities for informal interactions between delegates. A wide range of accommodation is available close by, including pensions for those on a tight budget. Attendance is limited to 100 people, consistent with a discussion-orientated meeting in which every delegate is an active participant – early registration is encouraged!

Invited Speakers 2020

Keynote Speaker: Jonathan Ewbank, Marseille, France

Confirmed Speakers:

  • Julie Ahringer, Cambridge, UK
  • Adler Dillman, Riverside, USA
  • Sebastian Eves-van den Akker, Cambridge
  • Jennifer Keiser, Basel, Switzerland
  • Frédéric Landmann, Montpellier, France
  • Meera Nair, Riverside, USA
  • Phil Newmark, Madison, USA
  • Meta Roestenberg, Leiden, Netherlands
  • Mark Siracusa, Newark, USA

Organising Committee

  • Amy Buck, University of Edinburgh
  • James Collins, University of Texas
  • Richard E Davis, University of Colorado
  • Kleoniki Gounaris, Imperial College
  • Rick Maizels, University of Glasgow
  • Murray Selkirk, Imperial College