GOCat4FT was developped thanks to the BioCreative 2013 GO task. The goal of this task was to promote research and tool development for assisting gene ontology (GO) term curation from biomedical literature. In the subtask A, participants had to retrieve GO evidence sentences for a given gene in a given full-text. A training set of 150 manually annotated full texts was provided. Then, the subtask B was a step towards the ultimate goal of using computers for assisting human GO curation: participants had to return a list of relevant GO terms from the previous output sentences. More information here.
For the subtask A, we have designed a state-of-the-art supervised statistical approach, using a na´ve Bayes classifier and the official training set (150 annotated papers). Then, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top twenty outputted concepts. Official results here. Thanks to BioCreative IV, we were able to design a complete pipeline for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts along with a set of evidence sentences.

