Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Design and testing of genome-proxy microarrays to profile marine microbial communities      
dateReleased:
05-25-2010
description:
Microarrays are useful tools for detecting and quantifying specific functional and phylogenetic genes in natural microbial communities. In order to track uncultivated microbial genotypes and their close relatives in an environmental context, we designed and implemented a “genome proxy” microarray that targets microbial genome fragments recovered directly from the environment. Fragments consisted of sequenced clones from large-insert genomic libraries from microbial communities in Monterey Bay, the Hawaii Ocean Time-series station ALOHA, and Antarctic coastal waters. In a prototype array, we designed probe sets to thirteen of the sequenced genome fragments and to genomic regions of the cultivated cyanobacterium Prochlorococcus MED4. Each probe set consisted of multiple 70-mers, each targeting an individual ORF, and distributed along each ~40-160kbp contiguous genomic region. The targeted organisms or clones, and close relatives, were hybridized to the array both as pure DNA mixtures and as additions of cells to a background of coastal seawater. This prototype array correctly identified the presence or absence of the target organisms and their relatives in laboratory mixes, with negligible cross-hybridization to organisms having ≤~75% genomic identity. In addition, the array correctly identified target cells added to a background of environmental DNA, with a limit of detection of ~0.1% of the community, corresponding to ~10^3 cells/ml in these samples. Signal correlated to cell concentration with an R2 of 1.0 across six orders of magnitude. In addition the array could track a related strain (at 86% genomic identity to that targeted) with a linearity of R2=0.9999 and a limit of detection of ~1% of the community. Closely related genotypes were distinguishable by differing hybridization patterns across each probe set. This array’s multiple-probe, “genome-proxy” approach and consequent ability to track both target genotypes and their close relatives is important for the array’s environmental application given the recent discoveries of considerable intra-population diversity within marine microbial communities. Keywords: target addition experiment, proof-of-concept for GPL6012 ***Overall Array design*** The prototype microarray targeted thirteen BAC or fosmid genome fragments (20-160kb) from both bacteria and archaea, recovered from a variety of marine habitats, as well as the cyanobacterium Prochlorococcus MED4. These clones were originally sequenced because of the presence of taxonomic marker or specific functional genes. This array consisted of sets of 70-bp oligonucleotides targeting each genome or genome fragment (Fig. 1), dispersed along the target sequences with no more than one probe per gene, and excluding rRNA genes as targets. The probes were selected solely based on theoretical thermodynamic properties and GC content (~40%); that is, probe selection did not focus on specific genes or regions, but simply produced the “optimal” probes for each genome proxy based on the probes’ predicted hybridization properties. rRNA genes were excluded, because this probe design approach, which avoids sequence alignments and considerations of RNA secondary structure, would be unlikely to result in useful rRNA probes. Furthermore, rRNA probes of traditional design could not be included on the array because their appropriate hybridization conditions would be very different from those of this array’s probes. ***Microarray probe design*** Microarray 70-mer probes were designed using the program ArrayOligoSelector (Zhu et al., 2003) with the following settings: target %GC = 40%, 1 probe/gene, with the ORFs for each genome fragment as both the input and the database file. The output candidate 70mers were then sorted based on their %GC and those closest to 40% were chosen. In the case of more than the target number of probes having 40%GC, the subset with the lowest free energy of hybridization were selected as probes. Generally, 20 probes were selected per organism. Prochlorococcus MED4 was represented by 60 probes total, 20 each for three different 80kb “genome-proxy” regions: 0-80kb, 1.29-1.37Mbp, and 1.58 to 1.66Mbp. Using the same method, a set (n=20) of positive control probes were designed to the genome of the halophillic archaeon Halobacterium salinarum NRC-1. Negative control probes (n=28) were designed to a set of 49 random 1000-base sequences (Stothard, 2000). ***Microarray construction and hybridization*** Oligonucleotides were synthesized (Illumina, San Diego, California), suspended in 3XSSC to a concentration of 40pmol/μl, and spotted on homemade poly-L-lysine-coated glass slides using a QArray 2 microarraying robot (Genetix, Hampshire, England). Six replicates of each probe were spotted. ***Microarray data analysis*** Hybridized arrays were scanned using an Axon Instruments 4000B scanner (Foster City, CA) and the data was normalized and filtered using perl scripts written for the purpose, by the following steps. (1) Signal intensities for each spot were calculated by subtracting the local background (mean F532 – median B532, as calculated by GenePix Pro 5.1 software, Axon Instruments). (2) The median value across replicates was calculated for each probe. (3) For each probe set, the number of probes greater than twice the mean negative control signal was calculated, before further processing. (4) Filter I: Arrays with less than half their positive control probes exceeding twice the mean negative control signal were considered poor quality, low dynamic range, arrays and were excluded from further analysis. (5) Each probe signal was corrected for non-specific binding by subtracting the mean negative control spot signal. (6) The data was then normalized for array-to-array variations in brightness by dividing each probe signal by the mean positive control signal. This positive control signal was the mean signal across the Halobacterium salinarum probes in each hybridization, with identical amounts of H. salinarum DNA having been added to each reaction prior to amplification and labeling. (7) Filter II: In order for a genotype to be considered “present”, at least 45% of its probes had to exceed twice the mean negative control signal. (8) Finally, each genotype signal was calculated as either the MEAN or TUKEY BIWEIGHT across its probe set. ***Experimental Design*** The array was hybridized to laboratory mixtures of cloned environmental genomic DNA targeted by the array, in varying ratios. The use of multiple probes to target many genes from each organism helped to normalize probe-to-probe heterogeneity, by averaging across all probes in a set (as described below). The evenness of probe response across each genotype’s set was also used to evaluate the relatedness of hybridizing DNA. To more precisely define the array’s phylogenetic range and specificity, it was tested against DNA from Prochlorococcus MED4 and related strains, spanning the known range of Prochlorococcus phylogenetic diversity. To test the effects of hybridization stringency on the specificity and signal of the MED4 probes, Prochlorococcus strains were hybridized at a range of conditions. To test whether the specificity results for Prochlorococcus were comparable for other targeted clades, two genome fragments recovered from closely related phylotypes within the SAR86 clade of the gamma-proteobacteria were represented on the array, and were tested for specificity. To understand the equivalence of probe sets targeting different regions of the same organism’s genome, we targeted three 80kb “genome proxy” regions of the Prochlorococcus MED4 genome. One of the regions fell in a genomic “island” where inter-strain variability is concentrated (“ISL5” in Coleman et al., 2006). To test the array in a complex environmental context, we collected coastal seawater (lacking detectable Prochlorococcus cells by flow cytometry) and added Prochlorococcus cells from strains MED4, MIT9515, MIT9312 and MIT9313 over a range of concentrations from ~101 – 106 cells/ml (Fig. 3). The seawater was then filtered and the DNA extracted, amplified, labeled, and hybridized to the array.
privacy:
not applicable
aggregation:
instance of dataset
ID:
E-GEOD-9384
refinement:
raw
alternateIdentifiers:
9384
dateSubmitted:
10-18-2007
keywords:
functional genomics
dateModified:
03-27-2012
availability:
available
types:
gene expression
ID:
A-GEOD-6012
name:
A genome proxy array for profiling marine micobial communities
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-9384/E-GEOD-9384.raw.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-9384/E-GEOD-9384.processed.1.zip
storedIn:
ArrayExpress
qualifier:
gzip compressed
format:
TXT
accessType:
download
authentication:
none
authorization:
none
accessURL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9384
storedIn:
Gene Expression Omnibus
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
EBI
homePage: http://www.ebi.ac.uk/
ID:
SCR:004727
name:
European Bioinformatics Institute
homePage: https://www.ebi.ac.uk/arrayexpress/
ID:
SCR:002964
name:
ArrayExpress