Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Supporting data for "Stepwise Distributed Open Innovation Contests for Software Development - Acceleration of Genome-Wide Association Analysis"      
dateReleased:
12-23-2016
privacy:
not applicable
aggregation:
instance of dataset
dateCreated:
12-23-2016
refinement:
curated
ID:
doi:10.5524/100264
creators:
Hill, Andrew
,
Pons, Pascal
Guinan, Eva
Lakhani, Karim
Kilty, Iain
Jelinsky, Scott A
availability:
available
types:
sequence
description:
The association of differing genotypes with disease related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low cost genotyping and sequencing has made collecting large scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies are being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies (GWAS) associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in less than 6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd based contest a combination of computational, numeric and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645863 variants, compared to PLINK 1.07’s logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project.
accessURL: https://doi.org/10.5524/100264
storedIn:
GigaScience Database
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
GigaDB
homePage: http://gigadb.org/
ID:
SCR:006565
name:
Giga Science Database

Feedback?

If you are having problems using our tools, or if you would just like to send us some feedback, please post your questions on GitHub.