Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Filtering of RNA-seq datasets and differences between cell types in global coordination of splicing and proportion of highly expressed genes      
Transcriptome or Gene expression
The goal of this study was to investigate whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. These data reveal specific candidate genes, which may be involved in global coordination of balance in the transcriptome. Overall design: The following samples were reprocessed and reanalyzed: Astrocytes, GSE52564/GSM1269903/GSM1269904 Endothelial cells, GSE52564/GSM1269915/GSM1269916 Cortical neurons, GSE52564/GSM1269905/GSM1269906 Oligodendrocytes, GSE52564/GSM1269911/GSM1269912 Microglia, GSE52564/GSM1269913/GSM1269914 Megakaryocyte-erythroid progenitors, GSE40522/GSM995525 Erythroid-committed precursors Gata1 KO, GSE40522/GSM995536 Libraries for all samples included two biological replicates, were prepared using polyA-selected RNA, and paired reads sequenced 100 bp from each end on HiSeq 2000 Sequencer (Illumina). The raw reads were reprocessed as follows. Reads were mapped to mouse reference genome mm10 (UCSC Genome Browser) and a comprehensive transcriptome annotation database GTF file, which was assembled by using the UCSC Table Browser Intersection utility to merge the GENCODE M4 transcripts in a non-redundant manner with the UCSC Gene Track transcripts that did not overlap more than 90% with the GENCODE transcripts. The raw reads were mapped using the TopHat/Bowtie2/Cufflinks pipeline, with -g option, to construct merged GTF file that included the annotated and novel transcript structures from all samples. Then, the IntersectBed tool (Bedtools) was used to retain only the reads that mapped to the merged GTF, which was converted to BED with Gtf2bed tool (Bedops). This filtering step allowed selecting the reads which contributed to the identified gene structures, and exclude noise and artifacts even if they mapped to the genome but did not contribute to gene structure. Next, only uniquely mapped and properly paired reads were selected using View -bq 4 -bh -f2 -F12 command (Samtools). After this step, DownsampleSam tool (Picard) was used to randomly subsample equal number of paired reads, which provided representative samples of the same size for all samples (34.6M per sample/replicate; reads count with Flagstat, Samtools). The reprocessed samples were reanalyzed as follows: The TopHat/Bowtie2/Cufflinks/Cuffdiff pipeline with -g option was used for determining normalized expression in FPKMs in each replicate of each sample with Cuffdiff’s across-sample normalization. After the cell types were segregated into two clusters based on higher or lower proportion of the highly expressed genes, the differential expression analysis was preformed between the two groups, which were treated as two conditions. For this analysis, each replicate of each cell type was assigned to one of only two cluster groups. For differential expression analysis, the Cuffdiff q-value cut off was set to 0.05. Software versions used: Tophat 2.0.12, Bowtie 2.2.4, Cufflinks 2.2.1, Samtools 0.1.19, Picard 1.79, Bedops 2.4.2, Bedtools 2.19.0. Analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School NIH supported shared facility.
Mus musculus
National Center for Biotechnology Information
NCBI BioProject


If you are having problems using our tools, or if you would just like to send us some feedback, please post your questions on GitHub.