Title: Genetic programming and gene expression profiling for molecular discrimination and characterization of lung cancers      
Lung cancers are a heterogeneous group of diseases with respect to biology and clinical behavior. So far, diagnosis and classification are based on histological morphology and immunohistological methods for discrimination between two main histologic groups: small cell lung cancer (SCLC) and non-small cell lung cancer which account for 20% and 80% of lung carcinomas, respectively. While SCLCs express properties of neuroendocrine cells, NSCLCs, which are divided into the three major subtypes adenocarcinoma, squamous cell carcinoma and dedifferentiated large cell carcinoma, show different characteristics such as the expression of certain keratins or production of mucin and lack neuroedocrine differentiation. The molecular pathogenesis of lung cancer involves the accumulation of genetic und epigenetic alterations including the activation of proto-oncogenes and inactivation of tumor suppressor genes which are different for lung cancer subgroups. The development of microarray technologies opened up the possibility to quantify the expression of a large number of genes simultaneously in a given sample. There are several recent reports on expression profiling on lung cancers but the analysis interpretation of the results might be difficult because of the heterogeneity of cellular components. A contamination of the tumor sample with normal epithelia, blood vessels, stromal cells, leucocytes and tumor necrosis may confound the true expression profile of the tumor. The use of laser capture microdissection (LCM) greatly improves the sample preparation for microarray expression analysis. Consequently, we used advanced technology including LCM and microarray analysis. In detail, we examined gene expression profiles of tumor cells from 29 previously untreated patients with lung cancer (10 adenocarcinomas (AC), 10 squamous cell carcinomas (SCC), 9 small cell lung cancer (SCLC)) in comparison to normal lung tissue (LT) of 5 control patients without tumor. Bronchoscopical biopsies from the primary lung tumor were taken before treatment. Biopsies were cut into 8µm sections and from each section cancer cells were isolated using laser capture microdissection in order to obtain pure samples of tumor cells. Total RNA was extracted, reversely transcribed, in-vitro transcribed, labelled and hybridized to the array. For expression analysis, microarrays covering 8793 defined genes (Human HG Focus Array, Affymetrix) were used. Following quality control, array data were normalized and analysed for significant differences using variance stabilizing transformation (VSN) and significance analysis of microarrays (SAM), respectively. Based on differentially expressed genes cancer samples could be clearly separated from non cancer samples using hierarchical clustering. Comparing AC, SCC and SCLC with normal lung tissue, we found 205, 335 and 404 genes, respectively, that were at least 2-fold differentially expressed with an estimated false discovery rate < 2.6%. Each histological subtype showed a distinct expression profile. Further, using a genetic programming approach we constructed a classificator to discriminate AC, SCC, NT and SCLC. To this end, the 50 genes with the greatest signal-to-noise ratio were selected to train the classificator. By leave-one-out cross validation all 34 samples were correctly classified in this training set. In order to validate the 50-gene-classificator on a test set, further 13 microdissected lung cancer samples were used and correctly classified in concordance to pathologic finding. In conclusion, the different lung cancer subtypes have distinct molecular phenotypes which reflect biological characteristics of the tumor cells and which might be the basis for development of targeted therapy. Moreover, gene expression profiling and genetic programming is a suitable tool for classification and discrimination of different histological subtypes in lung cancer in comparison to normal lung tissue. Keywords: ordered Overall design: Comparison of gene expression profiles of normal lung tissues, adenocarcinomas, squamous cell carcinomas and small cell lung cancers.
