Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: ENCODE PSU Hardison RnaSeq      
dateReleased:
08-31-2012
description:
This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered. One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi. Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation. For data usage terms and conditions, please refer to and Cells were grown according to the approved ENCODE cell culture protocols (). Total RNA was extracted from 5-10 million cells using TRIzol reagent. This was followed by mRNA selection, fragmentation and cDNA synthesis, which were performed as described previously (Mortazavi et al., 2009). Double-stranded cDNA samples were processed for library construction for Illumina sequencing, using the Illumina ChIP-seq Sample Preparation Kit. Strand-specific libraries were generated in a similar manner, except for a couple of modifications described previously (Parkhomchuk et al., 2009). Briefly, instead of dTTP, dUTP was used during second-strand cDNA synthesis to label the second-strand cDNA. During library preparation, the dUTP-labeled cDNA was treated with Uracil N Glycosylase, prior to the PCR amplification step. This was done to remove uracil from the second-strand, following which the DNA was subjected to high heat to facilitate abasic scission of the second strand. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are considered as biological replicates. Sequencing was done on the Illumina Genome Analyzer IIx and on the Illumina HiSeq 2000. FastQ files for the resulting sequence reads (single read and paired-end, directional and non-directional) were moved to a data library in Galaxy, and tools implemented in Galaxy were used for further processing via workflows ((Giardine et al., 2005), (Blankenberg et al., 2010 ), (Goecks et al., 2010). Data processing was also performed on the CyberSTAR high-performance computing system at Penn State. The reads were mapped to the mouse genome (mm9 assembly) using the program TopHat ((Langmead et al., 2009) and (Trapnell et al., 2009)). Signal tracks were created using BEDtools ((Quinlan et al., 2010)) and SAMtools ((Li, Handasaker et al., 2009)).
privacy:
not applicable
aggregation:
instance of dataset
ID:
E-GEOD-40522
refinement:
raw
alternateIdentifiers:
40522
keywords:
functional genomics
dateModified:
05-02-2014
availability:
available
types:
gene expression
name:
Mus musculus
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-40522/E-GEOD-40522.raw.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-40522/E-GEOD-40522.processed.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40522
storedIn:
Gene Expression Omnibus
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
EBI
homePage: http://www.ebi.ac.uk/
ID:
SCR:004727
name:
European Bioinformatics Institute
homePage: https://www.ebi.ac.uk/arrayexpress/
ID:
SCR:002964
name:
ArrayExpress