Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions.      
keywords:
Variation
ID:
PRJNA124851
description:
The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets. Overall design: 29 samples including the reference sample (NA15110) which was used in both channels in a single self-self experiment.
accesstypes:
download
landingpage: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA124851
authentication:
none
authorization:
none
ID:
pmid:20440878
dateReleased:
04-20-2010
name:
Homo sapiens
ncbiID:
ncbitax:9606
abbreviation:
NCBI
homePage: http://www.ncbi.nlm.nih.gov
ID:
SCR:006472
name:
National Center for Biotechnology Information
homePage: http://www.ncbi.nlm.nih.gov/bioproject
ID:
SCR:004801
name:
NCBI BioProject