MicroRNA microarrays possess a number of unique data features that challenge the assumption key to existing normalization methods. They need to be re-assessed using genuine benchmark datasets that realistically represent data characteristics of microRNA arrays.
Methods: We developed a blocked randomization design for Agilent microRNA arrays, and applied it to generate a benchmark dataset free of confounding batch effects comparing endometrial and ovarian tumors. The benchmark dataset was assessed for differential expression and treated as the gold standard. We used the same tumor samples to generate a test dataset allowing for batch effects. After normalization, the test dataset was assessed for differential expression and compared with the gold standard. In addition to an empirical evaluation, we simulated data using the test data to mimic a range of differential expression patterns with various amounts and levels of asymmetry of differential expression, and further assessed the performance of normalization methods.
Results: We observed moderate and asymmetric differential expression between endometrial and ovarian tumors in the benchmark dataset. Array effects were observed in the test data and resulted in a true positive rate of 53% and a false discovery rate of 90 percent. Normalization are useful in increasing the number of true positive markers identified but still possess a large number of false positive markers with a false discovery rate as high as 55%. We observed similar results in our simulated datasets.
Conclusions: Our study demonstrated the utility of randomization and blocking in a large tumor microarray study and underlines their important benefits in accurate detection of disease relevant markers. Proper randomization and blocking should be adopted in microarray studies to the extent possible. Our paired array datasets provides an objective and realistic evaluation of normalization methods for miRNA arrays, and it shows that current normalization methods are useful in increasing the number of true positive markers identified but still possess a large number of false positive markers. Research is warranted to develop more efficient methods for normalization when it is needed.