GOCat4FT,
the GOCat complete pipeline for full-text

GOCat4FT was developped thanks to the BioCreative 2013 GO task. The goal of this task was to promote research and tool development for assisting gene ontology (GO) term curation from biomedical literature. In the subtask A, participants had to retrieve GO evidence sentences for a given gene in a given full-text. A training set of 150 manually annotated full texts was provided. Then, the subtask B was a step towards the ultimate goal of using computers for assisting human GO curation: participants had to return a list of relevant GO terms from the previous output sentences. More information here.

For the subtask A, we have designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set (150 annotated papers). Then, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top twenty outputted concepts. Official results here. Thanks to BioCreative IV, we were able to design a complete pipeline for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts along with a set of evidence sentences.

Test GOCat4FT : please give a gene name, and either a PMCID or a full text.
Or try this or that query from the official BioCreative test set.

Your gene name :
Your PMCID :
Or your full-text :

GOCat4FT,the GOCat complete pipeline for full-text

GOCat4FT,
the GOCat complete pipeline for full-text