Novel Pattern Discovery and Driver Gene Identification in Integrated Cancer Genomic Data


Large-scale integrated cancer genome characterization efforts including the cancer genome atlas (TCGA) and the cancer cell line encyclopedia (CCLE) have created un- precedented opportunities to study cancer biology and discover therapeutic targets in a comprehensive genetic context. A fundamental challenge lies in translating the cancer genomic findings into clinical application. The path toward that end critically relies on our ability to systematically and effectively distill essential information from the massive amount of data in the discovery stage. We describe an enhanced Cluster framework or integrated visualization, pattern discovery, and driver gene identification using a set of fused probabilistic eigen-features that maximize the total biological variation collectively observed across cancer genomes. Complex dependency structures observed across somatic mutation, copy number abnormalities, promotor DNA methylation and gene expression changes can be expressed in terms of simpler conditional independence relationships through these eigen-features, which greatly facilitates integrated visualization, class discovery, and driver identification. In the NCI60 and CCLE datasets, we demonstrate that the method can accurately group cancer cell lines by the major histological subtypes, and correctly pinpoint known driver genes as well as candidate drivers that have not been previously reported in various cancer types. Application to the TCGA colorectal tumor data reveals novel integrated subtypes that suggest different paths to colon cancer.


This program is open to all.

Date & Time(s)


Memorial Sloan Kettering Cancer Center
307 East 63rd Street
Room 331
New York, NY 10065


Biostatistics Seminar Series
Department of Epidemiology and Biostatistics
Memorial Sloan Kettering Cancer Center