Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
1. ArrayExpress ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.
2. BioProject A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project.
3. Biological Magnetic Resonance Data Bank A Repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules
4. CardioVascular Research Grid The CardioVascular Research Grid (CVRG) project is creating an infrastructure for sharing cardiovascular data and data analysis tools. CVRG tools are developed using the Software as a Service model, allowing users to access tools through their browser, thus eliminating the need to install and maintain complex software.
5. Cell Image Library The Cell Image Library™ is a freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. The purpose of this database is to advance research, education, and training, with the ultimate goal of improving human health.
6. Center for Expanded Data Annotation and Retrieval CEDAR is making data submission smarter and faster, so biomedical researchers and analysts create and use better metadata. Through better interfaces, terminology, metadata practices, and analytics, CEDAR optimizes the metadata pathway from provider to end user.
7. ClinVar ClinVar aggregates information about genomic variation and its relationship to human health.
8. Clinical Trials Network A repository of data from completed CTN clinical trials to be distributed to investigators in order to promote new research, encourage further analyses, and disseminate information to the community. Secondary analyses produced from data sharing multiply the scientific contribution of the original research.
9. is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world.
10. Covid-19 Covid-19 is making data submission smarter and faster, so biomedical researchers and analysts create and use better metadata. Through better interfaces, terminology, metadata practices, and analytics, Covid optimizes the metadata pathway from provider to end user.
11. Dataverse Network Project A Dataverse repository is the software installation, which then hosts multiple dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). As an organizing method, dataverses may also contain other dataverses.
12. Dryad Data Repository is a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable.
13. GEMMA Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles. Gemma contains data from thousands of public studies, referencing thousands of published papers.
14. Gene Expression Omnibus Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
15. GeneNetwork GeneNetwork is a group of linked data sets and tools used to study complex networks of genes, molecules, and higher order gene function and phenotypes. GeneNetwork combines more than 25 years of legacy data generated by hundreds of scientists together with sequence data (SNPs) and massive transcriptome data sets (expression genetic or eQTL data sets). The quantitative trait locus (QTL) mapping module that is built into GN is optimized for fast on-line analysis of traits that are controlled by combinations of gene variants and environmental factors. GeneNetwork can be used to study humans, mice (BXD, AXB, LXS, etc.), rats (HXB), Drosophila, and plant species (barley and Arabidopsis). Most of these population data sets are linked with dense genetic maps (genotypes) that can be used to locate the genetic modifiers that cause differences in expression and phenotypes, including disease susceptibility.
16. Genomic Data Commons The NCI's Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine.
17. Genotype-Tissue Expression The Genotype-Tissue Expression (GTEx) project aims to provide to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation.
18. HGVS Locus Specific Mutation Databases Locus Specific Databases (LSDBs) are curated collections of sequence variants in genes associated with disease. LSDBs of cancer-related genes often serve as a critical resource to researchers, diagnostic laboratories, clinicians, and others in the cancer genetics community. LSDBs are poised to play an important role in disseminating clinical classification of variants.
19. ImmPort ImmPort is funded by the NIH, NIAID and DAIT in support of the NIH mission to share data with the public. Data shared through ImmPort has been provided by NIH-funded programs, other research organizations and individual scientists ensuring these discoveries will be the foundation of future research.
20. Interuniversity Consortium for Political and Social Research ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
21. Metabolomics Workbench The Metabolomics Workbench serves as a national and international repository for metabolomics data and metadata and provides analysis tools and access to metabolite standards, protocols, tutorials, training, and more.
22. Mouse Phenome Database The Mouse Phenome Database (MPD) has characterizations of hundreds of strains of laboratory mice to facilitate translational discoveries and to assist in selection of strains for experimental studies.
23. NIDDK Central Repository The NIDDK Central Repository stores biosamples, genetic and other data collected in designated NIDDK-funded clinical studies. The purpose of the NIDDK Central Repository is to expand the usefulness of these studies by allowing a wider research community to access data and materials beyond the end of the study.
24. NITRC Neuroimaging Data Repository The Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC) facilitates finding and comparing neuroimaging resources for functional and structural neuroimaging analyses—including popular tools as well as those that once might have been hidden in another researcher's laboratory or some obscure corner of cyberspace. NITRC-IR currently contains 14 Projects, 6845 Subjects, and 8285 Imaging Sessions.
25. National Sleep Research Resource The National Sleep Research Resource (NSRR) offers free access to large collections of de-identified physiological signals and clinical data elements collected in well-characterized research cohorts and clinical trials. The NSRR encourages interested researchers, educators, and trainees to join its community. Members can contribute their own data and tools for sharing, provide information and feedback on ways to improve sleep and physiological signal data exchange and analysis, and offer ideas on how to make NSRR and other resources work best for the scientific community.
26. Ndar Papers The National Database for Autism Research (NDAR) is an NIH-funded research data repository that aims to accelerate progress in autism spectrum disorders (ASD) research through data sharing, data harmonization, and the reporting of research results. NDAR also serves as a scientific community platform and portal to multiple other research repositories, allowing for aggregation and secondary analysis of data.
27. NeuroMorpho.Org NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications. It contains contributions from over 80 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared.
28. NeuroVault Atlases A place where researchers can publicly store and share unthresholded statistical maps, parcellations, and atlases produced by MRI and PET studies.
29. NeuroVault Cols A place where researchers can publicly store and share unthresholded statistical maps, parcellations, and atlases produced by MRI and PET studies.
30. NeuroVault NIDM A place where researchers can publicly store and share unthresholded statistical maps, parcellations, and atlases produced by MRI and PET studies.
31. Nuclear Receptor Signaling Atlas The Nuclear Receptor Signaling Atlas (NURSA) was created to foster the development of a comprehensive understanding of the structure, function, and role in disease of nuclear receptors (NRs) and coregulators. NURSA seeks to elucidate the roles played by NRs and coregulators in metabolism and the development of metabolic disorders (including type 2 diabetes, obesity, osteoporosis, and lipid dysregulation), as well as in cardiovascular disease, oncology, regenerative medicine and the effects of environmental agents on their actions.
32. Omics Discovery Index The Omics Discovery Index (OmicsDI) provides dataset discovery across a heterogeneous, distributed group of Transcriptomics, Genomics, Proteomics and Metabolomics data resources spanning eight repositories in three continents and six organisations, including both open and controlled access data resources. The resource provides a short description of every dataset: accession, description, sample/data protocols biological evidences, publication, etc. Based on these metadata, OmicsDI provides extensive search capabilities, as well as identification of related datasets by metadata and data content where possible. In particular, OmicsDI identifies groups of related, multi-omics datasets across repositories by shared identifiers.
33. Open sharing of Functional Magnetic Resonance Imaging is a project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data. The focus of the database is on task fMRI data.
34. PeptideAtlas PeptideAtlas is a multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. Mass spectrometer output files are collected for human, mouse, yeast, and several other organisms, and searched using the latest search engines and protein sequences.
35. PhysioBank PhysioBank is a large and growing archive of well-characterized digital recordings of physiologic signals and related data for use by the biomedical research community. PhysioBank currently includes databases of multi-parameter cardiopulmonary, neural, and other biomedical signals from healthy subjects and patients with a variety of conditions with major public health implications, including sudden cardiac death, congestive heart failure, epilepsy, gait disorders, sleep apnea, and aging.
36. ProteomeXchange The ProteomeXchange consortium has been set up to provide a single point of submission of MS proteomics data to the main existing proteomics repositories, and to encourage the data exchange between them for optimal data dissemination.
37. RCSB Protein Data Bank The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids found in all organisms including bacteria, yeast, plants, flies, other animals, and humans.
38. Rat Genome Database The Rat Genome Database (RGD) was established in 1999 and is the premier site for genetic, genomic, phenotype, and disease data generated from rat research. In addition, it provides easy access to corresponding human and mouse data for cross-species comparisons. RGD's comprehensive data and innovative software tools make it a valuable resource for researchers worldwide.
39. Scientific Data Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data.
40. Sequence Read Archive The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD® System, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
41. SimTK SimTK is a free project-hosting platform for the biomedical computation community that: Enables you to easily share your software, data, and models; Tracks the impact of the resources you share; Provides the infrastructure so you can support and grow a community around your projects; Connects you and your project to thousands of researchers working at the intersection of biology, medicine, and computations.
42. The Cancer Imaging Archive The Cancer Imaging Archive (TCIA) is a large archive of medical images of cancer accessible for public download. All images are stored in DICOM file format. The images are organized as "Collections", typically patients related by a common disease (e.g. lung cancer), image modality (MRI, CT, etc) or research focus.
43. The Electron Microscopy Data Bank (EMDB) at PDBe The Electron Microscopy Data Bank (EMDB) is a public repository for electron microscopy density maps of macromolecular complexes and subcellular structures. It covers a variety of techniques, including single-particle analysis, electron tomography, and electron (2D) crystallography. The EMDB was founded at EBI in 2002, under the leadership of Kim Henrick. Since 2007 it has been operated jointly by the PDBe, and the Research Collaboratory for Structural Bioinformatics (RCSB PDB) as a part of EMDataBank which is funded by a joint NIH grant to PDBe, the RCSB and the National Center for Macromolecular Imaging (NCMI).
44. The Retina Project The retina project is a collaboration between GENSAT project and the Roska Lab at FMI, Basel.The project and methods are described in Siegert et al. (Nature Neuroscience, 2009)
45. The database of Genotypes and Phenotypes The database of Genotypes and Phenotypes (dbGaP) archives and distributes the results of studies that have investigated the interaction of genotype and phenotype.
46. UniProt:Swiss-Prot UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings. The manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.
47. VectorBase VectorBase is a National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Center (BRC) providing genomic, phenotypic and population-centric data to the scientific community for invertebrate vectors of human pathogens.
48. WormBase WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes. Founded in 2000, the WormBase Consortium is led by Paul Sternberg of CalTech, Paul Kersey of the EBI, Matt Berriman of the Wellcome Trust Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research.
49. Yale Protein Expression Database The Yale Protein Expression Database (YPED) is an open source system for storage, retrieval, and integrated analysis of large amounts of data from high throughput proteomic technologies. YPED currently handles LCMS, MudPIT, ICAT, iTRAQ, SILAC, 2D Gel and DIGE, Label Free Quantitation (Progenesis), Label Free Quantitation (Skyline), MRM analysis and SWATH This repository contains data sets which have been released for public viewing and downloading by the responsible Primary Investigators.

Common Fund Repositories

The following repositories are part of NIH commons:

50. Library of Integrated Network-Based Cellular Signatures LINCS aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents, and by using computational tools to integrate this diverse information into a comprehensive view of normal and disease states that can be applied for the development of new biomarkers and therapeutics.
51. Roadmap Epigenomics Project The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The Consortium leverages experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.


The following repositories are ingested through DataCite:

52. Adaptive Biotechnologies Adaptive is at the forefront of immune-based discoveries, combining high-throughput sequencing and expert bioinformatics to profile T-cell and B-cell receptors. We bring the accuracy and sensitivity of our immunosequencing platform into laboratories around the world to drive groundbreaking research in cancer and other immune-mediated diseases. Adaptive also translates immunosequencing discoveries into clinical diagnostics and therapeutic development to improve patient care.
53. Australian Data Archive The Australian Data Archive (ADA) provides a national service for the collection and preservation of digital research data and to make these data available for secondary analysis by academic researchers and other users.
54. Bioinformatics Infrastructure for Life Sciences BILS (Bioinformatics Infrastructure for Life Sciences) is a distributed national research infrastructure supported by the Swedish Research Council (Vetenskapsrådet) providing bioinformatics support to life science researchers in Sweden.
55. CANDI Neuroimaging Access Point The Child and Adolescent NeuroDevelopment Initiative (CANDI) is a research program in the Department of Psychiatry at the University of Massachusetts Medical School dedicated to neuroimaging and treatment studies of individuals with mood disorders, trauma, early onset schizophrenia and developmental disabilities including autism and fragile X
56. Coherent X-ray Imaging Data Bank CXIDB is dedicated to further the goal of making data from Coherent X-ray Imaging (CXI) experiments available to all, as well as archiving it. The website also serves as the reference for the CXI file format, in which most of the experimental data on the database is stored in.
57. Collaborative Research in Computational Neuroscience To enable concerted efforts in understanding the brain experimental data and other resources such as stimuli and analysis tools should be widely shared by researchers all over the world. To serve this purpose, this website provides a marketplace and discussion forum for sharing tools and data in neuroscience. Information about the aims and scope of this site is given in an article (PDF also available here) published in February, 2008 in the Journal Neuroinformatics. To date we host experimental data sets of high quality that will be valuable for testing computational models of the brain and new analysis methods. The data include physiological recordings from sensory and memory systems, as well as eye movement data. For information about a data set select the data set in Data Sets and then navigate to the "About" page. In addition, this website hosts a forum for each data set and a general discussion forum. This website and the sharing of the data sets is funded by the CRCNS (Collaborative Research in Computational Neuroscience) program which is described in the About link.
58. Databrary Databrary is a video data library for developmental science. Share videos, audio files, and related metadata. Discover more, faster.
59. Figshare figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner
60. German Neuroinformatics Node The global scale of neuroinformatics offers unprecedented opportunities for scientific collaborations between and among experimental and theoretical neuroscientists. To fully harvest these possibilities, coordinated activities are required to improve key ingredients of neuroscience: data access, data storage, and data analysis, together with supporting activities for teaching and training.
61. GigaScience database GigaDB primarily serves as a repository to host data and tools associated with articles in GigaScience; however, it also includes a subset of datasets that are not associated with GigaScience articles; primarily from our funding partners BGI and CNGB. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study.
62. Johns Hopkins University Data Archive Johns Hopkins University Data Management Services provides archiving services for the Johns Hopkins research community through the JHU Data Archive. While some academic disciplines have established research data repositories, many fields of research do not have easily available options for archiving and sharing data. Our archiving services give researchers the opportunity to share their data outside of original collaborations and beyond the life of a researcJHUDA logo2h project. Characteristics of the JHU Data Archive: Data from any research discipline and with any file format Each dataset given a permanent citation and DOI, facilitating both attribution for authors and linkage to research publications Preservation of research data through regular file integrity checks and retention of multiple copies
63. London School of Hygiene and Tropical Medicine LSHTM Data Compass is a curated digital repository of research datasets produced by the London School of Hygiene & Tropical Medicine and its collaborators. It addresses the School's mission to improve health and health equity in the UK and worldwide and maximise the benefit and impact of its research by ensuring the underlying data can be safeguarded, shared and cited.
64. MBF Bioscience We design quantitative imaging software for stereology, neuron reconstruction, and image analysis, integrated with the world’s leading microscope systems, to empower your research. Our development team and staff scientists are actively engaged with leading bioscience researchers, constantly working to refine our products based on your feedback and scientific advances in the field.
65. MIT Laboratory of Computational Physiology The Laboratory for Computational Physiology (LCP), under the direction of Professor Roger Mark, conducts research on improving health care through new and refined approaches to interpreting data. Some of the group’s researchers have medical backgrounds; others have backgrounds in computer science, electrical engineering, physics, or mathematics; and others have training that spans several of these disciplines.
66. MorphoBank MorphoBank is a web application providing an online database and workspace for evolutionary research, specifically systematics (the science of determining the evolutionary relationships among species). One can think of MorphoBank as two databases in one: one that permits researchers to upload images and affiliate data with those images (labels, species names, etc.) and a second database that allows researchers to upload morphological data and affiliate it with phylogenetic matrices.
67. National Institute of Mental Health The National Institute of Mental Health (NIMH) is the lead federal agency for research on mental disorders. NIMH is one of the 27 Institutes and Centers that make up the National Institutes of Health (NIH), the nation’s medical research agency. NIH is part of the U.S. Department of Health and Human Services (HHS).
68. PeerJ PeerJ publishes the world's scientific knowledge through open access licensing. 2,751 peer-reviewed articles and 3,261 preprints since 2013.
69. Research Data Centre-DZA The FDZ-DZA (Forschungsdatenzentrum DZA) is a facility of the German Centre of Gerontology (Deutsches Zentrum für Altersfragen, DZA) and has received accreditation as research data center DZA by the German Data Forum (RatSWD). Its main task is to make data of the German Ageing Survey DEAS and the German Survey on Volunteering (FWS) accessible to researchers by providing user-friendly Scientific Use Files (SUF), documentation of the contents and instruments as well support for scholars using the data.
70. SBGrid We support publication of X-ray diffraction, MicroED, LLSM datasets, as well as structural models. All visitors can access our Laboratory and Institutional Collections. All structural biologists are invited to deposit datasets.
71. The Cambridge Crystallographic Data Centre The Cambridge Crystallographic Data Centre (CCDC) is the home of small molecule crystallography data and is a leader in software for pharmaceutical discovery, materials development, research and education.
72. Thieme Chemistry Thieme Medical Publishers is a German medical and science publisher in the Thieme Publishing Group. It produces professional journals, textbooks, atlases, monographs and reference books in both German and English covering a variety of medical specialties, including neurosurgery, orthopaedics, endocrinology, radiology, anatomy, chemistry, otolaryngology, ophthalmology, audiology and speech language pathology, and complementary and alternative medicine. Thieme has more than 1,000 employees and maintains offices in seven cities worldwide, including New York City, Beijing, Delhi, Stuttgart, and three other cities in Germany.
73. UCSD-Nature Signaling Gateway The UCSD Signaling Gateway Molecule Pages is a database providing essential information on the thousands of proteins involved in cell signaling. This database combines expert authored reviews with curated, highly-structured data (e.g. protein interactions) and automatic annotation from publicly available data sources (e.g. UniProt and Genbank). The information and data presented here are freely available to all users. The Signaling Gateway is hosted by the San Diego Supercomputer Center at the University of California, San Diego, and is funded by NIH/NIGMS Grant 1 R01 GM078005-01.
74. UCSF Clinical & Translational Science Institute The Clinical & Translational Science Institute (CTSI) facilitates clinical and translational research to improve patient and community health. We do this by providing infrastructure, services and training to enable research to be conducted more efficiently, effectively and in new ways.
75. UK Data Archive We acquire, curate and provide access to the UK's largest collection of social and economic data.
76. Zenodo Built and developed by researchers, to ensure that everyone can join in Open Science. The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013. In support of its research programme CERN has developed tools for Big Data management and extended Digital Library capabilities for Open Data. Through Zenodo these Big Science tools could be effectively shared with the long­-tail of research.