Colorectal Cancer: Mithat Gönen
Colorectal cancer affects over a million people in the US, half of whom develop metastases to the liver and require surgery, which is often accompanied by severe side effects. A team led by biostatistician Mithat Gönen will integrate genomic, radiomic, and clinical data to distinguish the people most likely to benefit from surgery from those expected to experience post-operative liver failure or early recurrences of metastases. The latter groups can then be offered alternative treatment.
Lung Cancer: Matthew Hellman
Some people with non-small cell lung cancer (NSCLC) have greatly benefited from the introduction of immunotherapy, but most do not respond. Medical oncologists Kathryn Arbour and Matthew Hellman and their team will build on ongoing efforts to annotate the clinical, pathologic, and molecular features of NSCLC to design a model that can predict the response to immunotherapy in a pre-treatment setting. They will also develop a large patient database as a foundation for assessing patients with advanced disease who are likely to benefit from a combination of chemotherapy and immunotherapy.
Breast Cancer: Pedram Razavi
Breast tumors show a large variability in their molecular characteristics. Current clinical models do not fully take this heterogeneity into account. Thus, these models cannot optimally assign patients to the most suitable treatment groups. Principal investigator Pedram Razavi will develop a machine learning model based on the incorporation of whole tumor imaging data with pathologic and genomic information, as well as clinical variables, for a better prediction of treatment response, and recurrence-free and disease-free survival.
Gynecologic Cancer: Yulia Lakhman
High-grade serous ovarian cancer is the most common and also the most lethal gynecologic malignancy. Radiologist Yulia Lakhman and colleagues will use machine learning techniques to outline and annotate tumors in medical images and integrate them with the tumors’ molecular profile. Their goal is to define multi-modal predictors of tumor progression and to stratify patients into the appropriate treatment groups.
Powered by the radiology archive: Harnessing historical image annotations for automated tumor segmentation: Nathaniel Swinburne and Robert Young
The scalable harmonization of multimodal cancer data requires accurate, automated segmentation of tumors on radiologic images. At MSK, the major impediment to achieving automated tumor segmentation using deep learning is the lack of large annotated tumor image datasets representative of our unique patient population. Our hypothesis is that massive existing image annotation repositories can be harnessed in a hybrid object detection — segmentation framework to enable fully automated tumor segmentation. While our model will be developed for brain tumors, the proposed generalizable data pipeline (extendable for use with any anatomy, solid tumor type, or radiologic imaging modality) would allow MSK’s existing massive archive of image annotations to be harnessed for training deep learning models to perform tumor computer vision tasks, including but not exclusive to segmentation, and will be foundational for a multi-modal, multi-omic research platform.
Automated retrieval of clinical data elements for the identification of genomic predictors of outcome and treatment response in cancer: Nikolaus Schultz, John Philip, and Steven Maron
Retrieval of clinical annotation of tumor samples and patients presents a major challenge for data integration, as the current approach of manual abstraction from largely unstructured electronic medical records (EMR) is difficult to scale. However, the use of clinical text classification by means of natural language processing (NLP) and advanced machine learning methods has the potential to unlock information embedded in clinical narratives. Here, a multidisciplinary team of data scientists and cancer biologists led by Nikolaus Schultz are creating a hybrid NLP system to leverage against structured and unstructured EMR to identify patient and sample specific attributes. They hypothesize that the development of this system will lead to a robust, large-scale system for enhanced clinical integration with genomic databases that can be used to predict outcome and treatment response of individual cancer patients.