The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In the earlier of these studies to investigate the whole genome, about 20 thousand genes were sequenced, using a two stage design where all genes were sequenced on a “discovery” set of samples, and then those in which at least one alteration was found were sequenced in an additional “validation” sample. The two-stage sampling, the rarity of mutations, the varied size and composition of genes, all contribute to generating an interesting and unusual testing ground for statistical methodologies. In this lecture I will present some of the statistical challenges that arise in these studies, with special emphasis on multiple testing and gene set analysis.
Giovanni Parmigiani
Department of Biostatistics
Harvard University
307 East 63rd Street, 3rd Floor Conference Room