Project GENIE BPC: Genomics Evidence Neoplasia Information Exchange BioPharma Collaborative

Project GENIE BPC: Genomics Evidence Neoplasia Information Exchange BioPharma Collaborative


The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange BioPharma Collaborative (GENIE BPC) is a multi-institution effort to build a pan-cancer data repository of genomic, therapeutic and clinical data. The data are curated from electronic health records using the PRISSMM™ framework to contribute to the goal of improving clinical decision-making based upon real-world data. The press release announcing the project and further details can be found on the AACR Website. The Statistical Coordinating Center, centralized at Memorial Sloan Kettering, is responsible for data management, quality assurance processes, derivation of variables for analysis and their respective documentation.

Publicly Available Data

GENIE BPC includes curation of data from patients with non-small cell lung cancer (NSCLC), colorectal cancer (CRC), breast cancer, pancreatic cancer, bladder cancer and prostate cancer from Memorial Sloan Kettering, Dana Farber Cancer Institute, Vanderbilt Ingram Cancer Center and the University Health Network.

The data can be accessed via the Synapse platform:

  1. Register for a Synapse account
  2. Navigate to the data release and request accept terms of use (e.g., for the NSCLC 2.0-public data release, navigate to the Synapse page for the data release). Towards the top of the page, there is information including the Synapse IDDOIItem count, and Access. Next to Access is a link that reads Request Access.
  3. Select Request Access, review the terms of data use and select Accept


Project GENIE BPC data contribute hundreds of clinical and genomic variables across a series of datasets with varying units of analysis. As such, manipulating the data to create a dataset that is ready for analysis is not trivial. To aid analysts in data preparation, Jessica Lavery, Samantha Brown, Hannah Fuchs, Mike Curry, Axel Martin, Karissa Whiting and Daniel Sjoberg created the {genieBPC} R package. This package streamlines the creation of an analytic cohort based upon cancer diagnosis and/or treatment information and visualization of oncologic treatment patterns using a sunburst plot. The package is currently available on CRAN and GitHub and was presented at the R/Medicine 2023 Virtual Conference

MSK Biostatistics Project Members

  • Kathy Panageas (Statistical Coordinating Center Member)
  • Jessica Lavery (Statistical Coordinating Center Member)
  • Samantha Brown (Statistical Coordinating Center Member)
  • Hannah Fuchs (Statistical Coordinating Center Member)
  • Karissa Whiting
  • Ronglai Shen
  • Yuan Chen


  • American Association for Cancer Research
  • Sage Bionetworks
  • Dana Farber Cancer Institute
  • University Health Network
  • Vanderbilt Ingram Cancer Center
  • Amgen, Inc.
  • AstraZeneca UK Limited
  • Bayer Healthcare Pharmaceuticals, Inc.
  • Boehringer Ingelheim
  • Bristol-Myers Squibb Company
  • Genentech
  • Janssen Pharmaceuticals, Inc.
  • Merck Sharp & Dohme Corp.
  • Novartis
  • Pfizer Inc.


Lavery JA, Lepisto EM, Brown S, Rizvi H, McCarthy C, LeNoue-Newton M, Yu C, Lee J, Guo X, Yu T, Rudolph J, Sweeney S; AACR Project GENIE Consortium, Park BH, Warner JL, Bedard PL, Riely G, Schrag D, Panageas KS. A Scalable Quality Assurance Process for Curating Oncology Electronic Health Records: The Project GENIE Biopharma Collaborative Approach. JCO Clin Cancer Inform. 2022 Feb;6:e2100105. doi: 10.1200/CCI.21.00105. PMID: 35192403; PMCID: PMC8863125.

Brown S, Lavery JA, Shen R, Martin AS, Kehl KL, Sweeney SM, Lepisto EM, Rizvi H, McCarthy CG, Schultz N, Warner JL, Park BH, Bedard PL, Riely GJ, Schrag D, Panageas KS; AACR Project GENIE Consortium. Implications of Selection Bias Due to Delayed Study Entry in Clinical Genomic Studies. JAMA Oncol. 2022 Feb 1;8(2):287-291. doi: 10.1001/jamaoncol.2021.5153. PMID: 34734967; PMCID: PMC9190030.

Kehl KL, Riely GJ, Lepisto EM, Lavery JA, Warner JL, LeNoue-Newton ML, Sweeney SM, Rudolph JE, Brown S, Yu C, Bedard PL, Schrag D, Panageas KS; American Association of Cancer Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) Consortium. Correlation Between Surrogate End Points and Overall Survival in a Multi-institutional Clinicogenomic Cohort of Patients With Non-Small Cell Lung or Colorectal Cancer. JAMA Netw Open. 2021 Jul 1;4(7):e2117547. doi: 10.1001/jamanetworkopen.2021.17547. PMID: 34309669; PMCID: PMC8314138.

Lavery JA, Brown S, Curry MA, Martin AS, Sjoberg DD, Whiting K, A data processing pipeline for the AACR project GENIE biopharma collaborative data with the {genieBPC} R package, Bioinformatics, Volume 39, Issue 1, January 2023, btac796,

Kehl KL, Uno H, Gusev A, Groha S, Brown S, Lavery JA, Schrag D, Panageas KS; Elucidating Analytic Bias Due to Informative Cohort Entry in Cancer Clinico-genomic Datasets. Cancer Epidemiol Biomarkers Prev 2023;

Choudhury NJ, Lavery JA, Brown S, de Bruijn I, Jee J, Tran TN, Rizvi H, Arbour KC, Whiting K, Shen R, Hellmann M, Bedard PL, Yu C, Leighl N, LeNoue-Newton M, Micheel C, Warner JL, Ginsberg MS, Plodkowski A, Girshman J, Sawan P, Pillai S, Sweeney SM, Kehl KL, Panageas KS, Schultz N, Schrag D, Riely GJ; The GENIE BPC NSCLC cohort: a real-world repository integrating standardized clinical and genomic data for 1,846 patients with non-small cell lung cancer. Clin Cancer Res 2023;