The goals of Aim 2 include supporting an integrated database for omic-level genetic, epigenetic, and gene expression data covering the model systems outlined in Aim 3, with associated biostatistics, bioinformatics, and data analytic support for Consortium member labs. We will take advantage of substantial informatic and computational biology strengths of multiple Coordinating Center sites. Because of the amazing advances in genomic analyses and information on SCLC, having a curated, easily accessible, and common database for this information has had a major stimulatory effect on SCLC research. It also allows rapid validation of new findings by multiple laboratories. Equally important are the computational biologic resources provided by the Coordinating Center to aid investigators across the Consortium in analyzing these complex datasets
The SCLC Consortium cBioPortal
The cBioPortal for Cancer Genomics is a powerful and flexible suite of integrated analytic tools for omic-level data storage, visualization, and analysis. This system enables intuitive evaluation and visualization of complex cancer genomics data and is used by thousands of cancer researchers and clinicians around the world. cBioPortal was initially developed by a team of investigators led by Nikolaus Schultz at MSK as a web-based platform to support the data generated by The Cancer Genome Atlas (TCGA), and continues to be maintained and expanded by the Schultz laboratory. Of note, the cBioPortal houses annotated data for all TCGA studies including genome, methylome, and transcriptome data. It also houses all clinical genomic data for patients at MSK on a secure internal server (~60,000 samples to date and growing at 500-600 new specimens a month) and has outward-facing secure password-protected portals supporting not only the SCLC Consortium but also multiple SU2C dream teams, NCI-supported SPOREs and P01s, and other multi-investigator consortia. The centralized biostatistical and bioinformatics support of the cBioPortal team within the MSK Center for Molecular Oncology (CMO) facilitates personalization of shared databases to support the needs of individual research groups and consortia. The ability of investigators across the SCLC Consortium to access all these data sets through a single Consortium interface is a signature accomplishment of the Coordinating Center.
PDX/CDX datasets from Johns Hopkins, MGH, UTSW, Utah, and MSK are available. One major ongoing effort will be to establish a central resource for storage of and investigator access to the many transcriptional profiling studies performed in different SCLC genetically engineered mouse models (GEMMs), organized and led by Dr. Oliver.
Centralized bioinformatics and data analysis support
Omic analytic support at MSK. To facilitate analysis and better comparison of sequencing data generated at different sites, we have made available to all SCLC Consortium members MSK-IMPACT sequencing of tumor models and use our unified CMO analysis pipeline. While individual Consortium sites certainly maintain their own analytical pipelines that can be readily displayed in cBioPortal in the Coordinating Center, our pipeline is easily transferable and will allow each center to run analyses on site. Use of this pipeline facilitates consistency across Consortium sites and reduces the requirement for transfer of raw sequence data.
SCLC Consortium Bioinformatics Group at MDACC. Drs. Byers and Jing Wang at MDACC co-chair the MDACC Thoracic Bioinformatics Working Group and serve as working group members for multiple TCGA projects. Given their experience in this domain, a focused SCLC bioinformatics team will be co-chaired by them and will include additional consortium members with expertise in SCLC biology, bioinformatics, computational biology, genomics, epigenetics, proteomics, gene expression profiling (bulk and single cell RNA seq). Notably, this team will also include Luc Girard, working with Dr. Minna at UTSW, and will coordinate efforts with the Cancer Systems Biology Center U54 Research Center focused on SCLC analytics at Vanderbilt. The expertise of this team will be made available in support of research initiatives by individual SCLC grants in the NCI portfolio, and for intra-laboratory collaborative projects
Data deposition and external access. Our informatics teams have extensive experience with, and have developed infrastructure to facilitate, both the deposition and acquisition of raw sequencing data from national genome data repositories including dbGaP and CGHub. Our approaches are compliant with modern genomic data sharing standards and guidelines from the NIH, including the NCI Genomic Data Commons.
CellMinerCDB
The CellMiner database enables exploration and analysis of cancer cell line pharmacogenomic data across different sources. This resource is available to the public here.