Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Supporting data for "A close look at protein function prediction evaluation protocols"      
dateReleased:
08-24-2015
privacy:
not applicable
aggregation:
instance of dataset
dateCreated:
08-24-2015
refinement:
curated
ID:
doi:10.5524/100153
creators:
,
Funk, Chris
Ullah, Fahad
Verspoor, Karin
Ben-Hur, Asa
availability:
available
types:
sequence
description:
The recently held Critical Assessment of Functional Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine if cross-validation provides a good estimate of performance. The CAFA2 task is a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (GOstruct, binary SVMs, and guilt by association) find it hard to achieve the same level of accuracy on these two tasks compared to cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods.
accessURL: https://doi.org/10.5524/100153
storedIn:
GigaScience Database
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
GigaDB
homePage: http://gigadb.org/
ID:
SCR:006565
name:
Giga Science Database

Feedback?

If you are having problems using our tools, or if you would just like to send us some feedback, please post your questions on GitHub.