While genes contain the hereditary information, including genetic predisposition to cancer and other diseases, it is their products that confer the actual phenotypes of living organisms and, in case of disease, normal versus pathological states. Since there are many post-translational events that can modify the biological structure, function, and degradation of proteins, the knowledge of genes alone does not even begin to describe the full complexity of biological systems. From a screening perspective, it is also mostly proteins that are secreted or otherwise released from tissues into the blood stream. Yet, despite an intensive search during the past decade(s), only a small number of identified cancer biomarkers, all plasma proteins (e.g., PSA, CEA, CA125, …), have proven clinically useful, often in combination with other diagnostic tools, for the prognosis of response to therapy, relapse and survival, for defining the rate of progression and monitoring of treatment, but less so for broad based population screening. Those proteins are typically present in plasma or serum at sub-nanomolar concentrations and require individual immunoassays for detection and quantitation.
New and improved cancer biomarkers and facile detection methods are clearly on order but have so far eluded discovery and implementation. Even the most recent approaches of identity-based proteomics, that involves digesting (e.g., with trypsin) complex protein mixtures into peptides for mass spectrometric (MS) analysis, have yet to translate into any practical applications. This failure largely derives from the insufficient instrumental dynamic range and because the overly elaborate fractionation procedure, coupled to multiple MS runs, that is needed to detect low-abundant tryptic peptides precludes processing statistically relevant sample numbers.
As cancer involves the transformation and proliferation of altered cell types that produce high levels of specific proteins and enzymes such as proteases (for instance, PSA and PSMA), it will not only modify the array of existing serum proteins (the serum ’proteome’) but also their metabolic products, i.e. peptides (the serum ’peptidome’). Yet it has remained unclear until recently whether this complex peptidome may provide a robust correlate of some biological events occurring in the entire organism. Since this represented an attractive yet untested possibility, we chose to investigate it further.
2. MS-based peptide profiling platform
The proteomics group at Sloan Kettering has developed an automated procedure for the simultaneous measurement of peptides in serum that utilizes magnetic, reversed-phase beads for analyte capture and a MALDI-TOF MS read-out. This system is more sensitive than surface capture on chips as spherical particles have larger combined surface areas, and therefore higher binding capacity, than small-diameter spots. Coupled to high-resolution MS and MS/MS, hundreds of peptides have been detected in a single droplet of serum, many of which can be readily identified without further fractionation. Automation facilitates throughput and ensures reproducibility, and we have also developed a minimal entropy-based algorithm that simplifies and improves alignment of spectra and subsequent statistical analysis (Villanueva et al., 2004; 2005).
3. Cancer patient sera analysis
Using this highly optimized peptide extraction and MALDI-TOF MS-based approach, we then showed that a limited subset of serum peptides (together forming a qualitative and quantitative pattern; i.e., a ’signature’) provided accurate class discrimination between patients with three types of solid tumors and controls without cancer. To obtain this result, we first sorted through hundreds of features to identify several that were most predictive of outcome and showed that reduction in the number of key peptides to a few (i.e., the ’signatures’) that were easily recognized between samples did not adversely affect class predictions. We then demonstrated thatthis signature could be used to discriminate between cancer and control in an independent validation set comprised of serum samples obtained from patients with advanced prostate cancer (Villanueva et al., 2006).
Interestingly, MS-based sequence characterization had previously indicated that a large part of the human serum ’peptidome’, as detected by MALDI-TOF MS, is produced ex vivo (i.e., after the blood sample collection) by degradation of abundant substrates by endogenous proteases. Polypeptide fragments are generated during the proteolytic cascades that take place in the intrinsic pathway of coagulation and complement activation. Some of these are known bioactive molecules, others represent cleaved propeptides, and still others are seemingly ’random’ internal fragments of the precursor proteins. Once generated, the ’founder peptides’ are pared down by exoproteases into ladder-like clusters. This may have some potentially exploitable outcomes in that, when given enough time and substrate, otherwise ’invisible’ low-concentration enzymes can generate catalytic product that is measurable by MALDI TOF MS.
Sure enough, by carefully correlating the identified proteolytic patterns with several disease groups and controls, we then went on to show that exoprotease activities, superimposed on the ex vivo coagulation and complement-degradation pathways, contribute to generation of not only cancer-specific but also ’cancer type’-specific serum peptides. None of the signature peptides were in fact derived from cancer cells, which implies that different tumor types secrete and/or shed distinct sets of proteases that, through their catalytic activity, generate unique serum peptide profiles.
The small number of blood proteins that are the source of nearly all the peptides in the prostate, bladder and breast cancer signatures are therefore not biomarkers in the strict sense but simply serve as an endogenous substrate pool for the real ones, i.e., tumor-derived proteases. There is also no actual relationship between the precursor substrate concentrations and the MS-ion intensities of many of the degradation products. Highly abundant serum proteins such as albumin and immunoglobulins were not represented.
Taken together, our studies provide a direct link between peptide marker profiles of disease and differential protease activities, and the patterns we describe may have clinical utility as surrogate markers for detection and classification of cancer.
Our findings suggest that future work to optimize serum peptidomics for clinical practice should be carried out with the recognition that endogenous proteolytic activities contribute important cancer type-specific information.
Focused mass spectrometric analysis of key peptides, derived from either endogenous or custom-synthetic substrates, and utilizing isotopically labeled standards to absolutely quantify all pattern-contributing peptides, should then facilitate future introduction of this technology into clinical practice.
Alternatively, identification and characterization of the protease panels could lead to direct immunoassay based, quantitative diagnostic tests that may be better suited in a clinical environment.