Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: ENCODE Genome Institute of Singapore RNA-Seq      
dateReleased:
02-14-2011
description:
This track is produced as part of the ENCODE Project. It shows high throughput sequencing of RNA samples from tissues or sub cellular compartments from cell lines included in the ENCODE Transcriptome subproject. For data usage terms and conditions, please refer to and The RNA-Seq data were generated from high quality polyA RNA, and the RNA-Seq libraries were constructed using SOLiD Whole Transcriptome (WT) protocol and reagent kit. Total RNA in good quality was used as starting materials and purified twice through MACs polyT column aimed to enrich polyA and remove any contaminants (e.g., rRNA, tRNA, DNA, protein etc.). A one microgram enriched polyA RNA sample was then fragmented to small pieces, and a gel-based selection method was performed to collect fragmented random polyA at a size-range of 50-150 nt in length. The collected fragmental RNA was then hybridized and ligated to a mix of adapters provided from ABI, followed by reverse transcription to generate corresponding cDNAs. The resulting cDNA library was further amplified by PCR and sequenced by SOLiD platform for single reads at 35 bp length (new version in 50 bp length). Cells were grown according to the approved ENCODE cell culture protocols. Data: The SOLiD-generated RNA-Seq reads were 35 bp in length. An initial filtering process was performed to remove any non-desirable contamination sequences, such as rRNA, tRNA, and repeats etc. A read-split mapping approach was developed to map the 35 bp reads onto the reference genome (GRCh37/hg19) excluding mitochondrion, haplotypes, randoms and chromosome Y. Mapping parameters: Strand specific mapping was done using Applied Biosystems' SOLiD alignment where all the reads were mapped to the genome, and to exon-exon junction database. Seed and extend strategy is adopted where initial seed length of 25 is mapped with maximum of 2 mismatches and then extended to read length, each color space match is awarded a score of +1 and each mismatch is awarded a penalty of -2. After extension each read is trimmed to its maximum score, shortest length. The color space sequences are then converted into base space and checked to ensure that each sequence has a maximum of 2 base pair mismatches. If any sequence has more than 2 mismatches, then that sequence is discarded.
privacy:
not applicable
aggregation:
instance of dataset
ID:
E-GEOD-27221
refinement:
raw
alternateIdentifiers:
27221
keywords:
functional genomics
dateModified:
06-02-2014
availability:
available
types:
gene expression
name:
Homo sapiens
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-27221/E-GEOD-27221.raw.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-27221/E-GEOD-27221.processed.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27221
storedIn:
Gene Expression Omnibus
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
EBI
homePage: http://www.ebi.ac.uk/
ID:
SCR:004727
name:
European Bioinformatics Institute
homePage: https://www.ebi.ac.uk/arrayexpress/
ID:
SCR:002964
name:
ArrayExpress