Biomedical investigators increasingly face three related obstacles:
- the volume of complex and high-dimensional biological and clinical data is exploding,
- there is a common need to integrate the imaging phenotype with these data, but
- effective analysis tools to integrate these data are lacking.
Although there have been some attempts to design “multiomics” integration tools, they typically take an ad hoc approach. Our BTRC will develop innovative mathematical algorithms and software tools, driven by novel mathematical foundations, to push integrative biomedical science forward. This will result in effective solutions for characterizing and integrating imaging with biological data in a reproducible and rigorous manner. CQIBA will introduce a new generation of integrative analysis solutions, based on groundbreaking optimal mass transport and Wasserstein-distance (OMT-WD) theory on data networks, to increase the insight achievable with high-dimensional, hetero-modal biomedical datasets. CQIBA will also test and improve these tools through collaborations and will further distribute these advanced methods via online wiki resources, short courses, webinars, and GitHub.
The proposed innovative mathematical tools, with the quantitative imaging, bioinformatics, and cancer analysis expertise of the team as well as feedback from our collaborators, will produce advanced methods applicable to complex datasets that are increasingly produced by NIH-funded investigators.
The fundamental innovation underpinning the proposed center lies in the development of a novel insight, which sees data as analogous to a fluid with special properties, and data as rearrangements (flows) of this fluid on networks that encapsulate the relatedness of data elements. This is made rigorous using optimal mass transport (OMT) theory, together with the relevant method for measuring distances on such networks, the Wasserstein-distance (WD). This is a fundamental step for addressing the common problem of datasets that are wide (many elements for each subject), but not deep (limited number of subjects).
CQIBA would be the first BTRC to focus on the development of novel tools and procedures to jointly analyze extracted imaging phenotype features together with corresponding biological array data, such as genomic, pathological, blood/urine, and outcome data. A unifying theme is the formulation of cutting-edge methodologies applicable to the complex networks that represent biological data and provide a framework to seamlessly integrate other data of any type. Our techniques have a sound mathematical basis that is built into all stages of analysis from data harmonization and segmentation to integrated mathematical prediction models. This is a unique perspective and will have a major impact across medical research and, ultimately, treatment.
There is a clear demand for these tools, as demonstrated by our very successful “cold call” recruitments of 22 initial candidate collaborators identified through an NIH Reporter search. All 22 expressed a desire to work with us. We eliminated those who had lapsed funding, or who were purely image-analysis driven, and then split the group based on the likelihood of feedback and customization needed for collaborative and service projects.
Overall, we have identified 7 CPs and 9 SPs, distributed across the U.S., and funded by NIAID, NIA, NIBIB, NHLBI, and NCI. CQIBA brings together diverse backgrounds in imaging physics, informatics, machine learning, and applied mathematics in a closely coordinated team with a strong track record of collaboration. We will develop and iteratively refine the proposed methods in collaboration with our partners. In addition, we will conduct extensive training workshops, and we will freely distribute software tools, integrated into our platform, to achieve the dissemination goals of the Center. In summary, the Center has the potential of advancing the basic understanding of many diseases, using new, broadly applicable tools for data science.
The principal investigators of the proposed Center have extensive backgrounds in biomedical analysis. They have pioneered mathematical formulations to solve challenging problems in image processing as well as integrating other data sources such as genetics.
Optimal Mass Transport theory is the core methodology used at this center. It has deep roots in pure mathematics, combining complex analysis, Riemannian geometry and measure theory. Monge first raised the classical Optimal Mass Transport Problem that concerns determining the optimal way, with minimal transportation cost, to move a pile of soil from one place to another. Kantorovich has proven the existence and uniqueness of the optimal transport plan based on linear program. Monge-Kantorovich optimization has been used in numerous fields from physics and econometrics to computer science, including computer vision, medical imaging and statistics.