Researchers Build Pan-Cancer Biomarker Tool to Categorize Multiple Cancer Types


Isidore Rigoutsos headshot

Researchers at the Sidney Kimmel Cancer Center at Jefferson recently discovered that isomiRs (small non-coding RNA molecules produced in a cell) are powerful enough to accurately classify samples that can belong to 32 types of cancer. The research was published in the journal Nucleic Acids Research.

Work in recent years has revealed that microRNAs (miRNAs), small non-coding RNA molecules of about 22 nucleotides, exist as multiple variants in a cell. These variants are called isomiRs and are produced by cells in a regimented mannerIn 2014, the lab of Isidore Rigoutsos, PhD, was the first to show that in healthy individuals, the abundance of isomiRs is modulated by a person’s sex, population origin, and race, indicating that isomiR expression is a type of transcriptomic heterogeneity that contributes to the diversity among humans. In 2015, follow-up work showed that isomiR abundance is also modulated by tissue state and disease subtype.

“Our early analyses were hinting that, in addition to all the other dependencies, isomiR production had a tissue-specific aspect to it as well,” said Rigoutsos, Director of the Center for Computational Medicine at Thomas Jefferson University. “This made us think that we might be able to leverage that tissue dependence to distinguish simultaneously among multiple cancers.”

Widely present in animals, plants, and some viruses, miRNAs are short regulatory RNAs. Since their original discovery in 1993, miRNAs have shown to be important and very potent regulators of gene expression in numerous cellular contexts. Many conditions and diseases have been associated with disruptions of miRNA abundance. For many years, it was believed that each miRNA locus of the genome produced a single miRNA molecule. But as more powerful deep sequencing technologies became commonplace, many research groups started noticing that for each miRNA, multiple variants co-exist in a cell. These variants (isomiRs) generally differ in abundance from one another. At the sequence level, isomiRs of the same miRNA differ very slightly in either their 5´ end, their 3´ end, or both. In the early days following the discovery of isomiRs, most researchers continued to focus on the isoform that had been reported originally in the literature for each miRNA and dismissed all its other variants.

“It helps to think of each miRNA locus on the genome as producing a ‘cloud’ of co-existing isomiRs. Some miRNA loci tend to always produce small clouds (i.e. a small number of different isomiRs). Other miRNA loci tend to produce large clouds that comprise high numbers of co-present isomiRs. When we first compared two different tissues, we noticed that clouds associated with the same miRNA locus differed both in terms of how many and also which isomiRs they contained. We thought that this was a valuable observation that might provide an insight into human disease,” said Rigoutsos.

In their new study, the team examined the isomiR profiles from more than 10,000 tumor samples and 32 different cancer types from the Cancer Genome Atlas (TCGA) repository. The researchers used an iterative approach, carrying out multiple rounds of training and testing a multi-label classifier. During the training portion of each round, they randomly chose 60% of the samples available for each cancer type to train the classifier. They then tested the classifier with the remaining 40% of the samples from each cancer type. This training and testing process was repeated 1,000 times. The team showed that they could successfully classify the unseen test samples with an average sensitivity of 90% and a false discovery rate of 3% or less.  They also used their classifier to categorize other non-TCGA datasets that were generated using deep sequencing or microarrays, and was able to label them with analogous accuracy. Notably, the researchers’ approach explicitly ignores the actual abundance levels of all isomiRs. Instead, an isomiR is simply called “present,” if its abundance places it in the top 20% of the isomiR population; otherwise, it is called “absent.” This seemingly unusual choice has the benefit of making the team’s approach potentially applicable to serum samples where it has proven difficult to identify molecules that can be used to “normalize” abundance.

A somewhat counterintuitive result that emerged from this analysis is that isomiRs with the highest ability to discriminate among different cancers have not been the best studied in the literature. Moreover, the analysis also showed that isomiRs produced by many miRNA loci with proven biological importance in multiple cancers are surprisingly poor cancer biomarkers. Taken together, these results indicate that the pan-cancer biomarker tool offers a new, unbiased approach to classifying cancer samples, which does not depend on earlier research or preconceived notions of how a given cancer arises and progresses.

These findings have several important implications. For researchers, they highlight an unexpected level of complexity in post-transcriptional regulation. Across the 32 analyzed cancer types, many more regulatory miRNA molecules are at work than was believed previously. The vast majority of these molecules have not been studied to date, highlighting the importance of the pan-cancer biomarker tool for guiding further research. The functional roles of these novel miRNA isoforms being unknown, they will need to be included in future cancer investigations. For cancer patients, these findings potentially represent invaluable new knowledge that has until now eluded researchers. In all likelihood, these molecules control key aspects of cancer biology and have a strong cancer-type dependency. The demonstration that different isoforms of the same miRNA should be examined in each cancer type opens the door to novel, more powerful, and less intrusive methods for diagnosis, monitoring, and ultimately treatment of various malignancies.