I grew up in Waterloo, Ontario, a university town about an hour’s drive from Toronto. My parents are both academics, so from quite a young age I gravitated toward a career in research. My initial plan was to do an undergraduate degree in mathematics and then head into theoretical physics, but I quickly realized that I loved pure mathematics more. After receiving my undergraduate degree from the University of Waterloo in 1992, I went on to get my doctorate in mathematics at the University of California, Berkeley.
The subject I studied was differential geometry and the representation theory of Lie groups — a very old, deep, and beautiful area. The level of abstraction in mathematics is probably greater than in any other field, and studying mathematics is a wonderfully rigorous training that teaches you to solve problems posed in terms of this abstract machinery.
What all mathematicians have in common is an appreciation for the beauty of the subject. But I also began to find it quite isolating. The audience of a mathematics paper might only be a handful of people in the world.
What I really wanted was to do work that connected with the rest of science and had impact beyond a small collection of peers. Besides, if you try telling someone at a cocktail party that you are a mathematician, they will invariably respond by telling you the last math class they took — which is usually eleventh-grade calculus!
A Leap into Computational Biology
After finishing my dissertation in 1998, I came to Columbia University for a postdoctoral fellowship in the mathematics department, but already I was beginning to look for a new direction. It was lucky timing — the Human Genome Project was well under way and the field of computational biology was starting to gain attention.
At the same time, there was a lot of exciting progress in machine learning, a new area at the interface of computer science and statistics that studies algorithms for learning from data. I was introduced to both fields when I sat in on a class taught by William Stafford Noble, who was a young faculty member in Columbia’s computer science department at the time (he is now in Genome Sciences at the University of Washington).
In 2000, in something of a leap of faith, I left my postdoc early and took a lecturer position in the Department of Computer Science at Columbia. Bill Noble generously invited me to his lab meetings, and we started a collaboration that also initiated my computational biology research career.
Academia operates by certain implicit rules, one of which is that you are not supposed to take a non-tenure-track faculty position as I had just done. Perhaps when you make such a radical change in your scientific direction, you stop worrying about breaking the rules! And actually, I had found a strange pocket of freedom — I now had time off the clock to establish myself in a new area, and I was free to pursue my own research program, start collaborations with wet lab biologists, apply for grants, and even admit PhD students.
Trying to build your own research group without the usual trappings of a tenure-track job, like a start-up package, certainly teaches you to be quite resourceful! After three years, I took a research scientist position at a newly established machine learning research center at Columbia called the Center for Computational Learning Systems (CCLS), which allowed me to continue to lead my lab without teaching duties. At the same time, I was very involved in Columbia’s new Center for Computational Biology and Bioinformatics (C2B2).
Broadly speaking, my research involves developing machine learning methods — algorithms that “train” on noisy and high-dimensional genome-scale data — to study diverse problems in computational and systems biology. The first project I worked on had to do with remote protein homology detection, namely, can you predict the structural class of a protein from its primary sequence of amino acids, even if it is not close in sequence to any protein for which you already know the structure?
To tackle this question, I helped develop support vector machine (SVM) methods for sequence data, which have gone on to be used in many applications both inside and outside computational biology. I also collaborated with Larry Chasin, a splicing expert in the Department of Biological Sciences at Columbia, to apply these SVM methods to find sequence determinants of splicing in humans.
A bit later, I became very interested in modeling transcriptional gene regulation and, together with several computational collaborators, developed a method called MEDUSA for learning regulatory programs from gene expression data and control regions in the noncoding DNA. I started a great collaboration with Li Zhang, an expert on hypoxia who is now at the University of Texas at Dallas, to model the yeast oxygen and heme regulatory network and to confirm our computational predictions with biochemical experiments.
Coming from pure mathematics into this very interdisciplinary field has been a revelation. Even though I do not use most of the technical machinery I acquired in my doctoral studies, I feel that what I do now is much more creative. The real challenge is to formulate a mathematical problem that is tractable from a machine learning and computational point of view but that still sheds light on the underlying biological question.Back to top
Best of Both Worlds
I realized that I needed the institutional support of a tenure-track position to get to the next stage in my career, and I started interviewing about a year ago. Initially, I wanted a regular faculty position in a computer science department, and almost all my interviews were in computer science. But by the end of the process, I realized I was much more interested in being at a place where I could immerse myself in biomedical research and be close to emerging biological questions.
I knew of the Computational Biology Program at Sloan Kettering Institute from having worked with Chris Sander, who heads the program, on a collaborative grant on microRNA regulation, and he invited me to apply. It is a special place — at most institutions, computational biologists are dispersed across departments and schools, with only one or two people in any given department, surrounded by colleagues who do not always understand what they do.
Here there is a complete department dedicated solely to computational biology, so there is a much stronger intellectual kinship. But there are also numerous opportunities for collaborations with biological and clinical researchers. It is the best of both worlds for me.
Since I arrived in July 2007, I have been immersed in all the wonderful resources of the tri-institutional campus that Memorial Sloan Kettering Cancer Center shares with Weill Medical College of Cornell University and The Rockefeller University. Between the three institutions, there is so much activity and an amazing density of information in areas that I find particularly compelling, including small regulatory RNAs and stem cells and of course cancer research.
In my laboratory, we have embarked on a number of new research directions. For example, we are investigating ways to model microRNA-mediated regulation and improve target prediction by integrating different data types, including microRNA expression data. And we see a tremendous opportunity for applying machine learning to the immense amount of data generated by cancer genomics initiatives. I am excited to be here and look forward to new scientific collaborations at Memorial Sloan Kettering Cancer Center.Back to top