Natural Language Processing System for Structured and Unstructured Patient Records


Natural Language Processing System for Structured and Unstructured Patient Records



Modern leukemia diagnosis and treatment is highly data-driven. Each disease assessment involves staining slides with multiple antibodies; performing multidimensional flow cytometry; cytogenetic assessment including karyotype, FISH, and/or SNP arrays; next-generation sequencing testing for 10s to 100s of gene mutations and/or rearrangements; and targeted molecular assays. Data from these studies are interpreted by hematopathologists, and summary reports are deposited in the electronic medical record (EMR) alongside physician notes, other lab results, and treatment data.

This invention is a natural language processing (NLP)-based system that extracts relevant data from these reports, processes the findings to provide automated leukemia risk stratification and treatment regimen information, and provides tools to rapidly perform retrospective clinical studies and share these results.


  • Rapid clinical studies: reduces retrospective clinical data processing and reporting lead times from months to minutes
  • Improved risk stratification: quickly and easily identifies high-risk patients to improve care quality and drive increased revenue
  • Frustration-free workflows: automates data extraction, processing, and sharing while avoiding time-consuming manual coordination
  • Demonstrated results: this technology has been applied in multiple studies analyzing large volumes of retrospective patient data to derive clinical insights, including disease prognosis, patient outcomes, and effectiveness of therapeutics.


This invention addresses a high unmet need, since the U.S. health care industry spends $8.7 billion per year employing people primarily to read clinical notes to abstract data for various clinical workflows. Key market verticals include life sciences and biopharma (research data to support drug development and commercialization), healthcare providers (quality and efficiency of care), and payers (risk identification and stratification). Market trends that contribute to the overall growth of these use cases include value-based care and increasing interoperability and data portability in health IT systems.


PCT application PCT/US2021/056687 pending, filed on Oct.26, 2021.


  • Stahl et al., (2021) Clinical and molecular predictors of response and survival following venetoclax therapy in relapsed/refractory AML. Blood Advances (link)
  • Ahr et al., (2018) AML with Mutations in IDH1 and DNMT3A Exhibits a Distinct Epigenetic Signature with Poorer Overall Survival. Blood (link)
  • Kresch et al., (2019) Acute Leukemia with Lineage Infidelity: Mixed Phenotype AML Exhibits a Distinct Immunophenotype with Clinical Features Overlapping Mixed Phenotype Acute Leukemia. Blood (link)
  • Onyekwere et al., (2020) Immunophenotypic Lineage Assessment By Multiparameter Flow Cytometry Provides More Precise MDS Prognosis. 62nd ASH Annual Meeting and Exposition (link)


Jacob Glass, MD, PhD, Medical Oncologist, Memorial Hospital, and Member of the Ross Levine Lab, Memorial Hospital Research, Human Oncology & Pathogenesis Program, MSK


Rick Peng, Business Development & Licensing Manager, Office of Technology Development, [email protected]

MSK Internal Code: SK2020-071

Stage of Development

Ready to use