Source Code

Available upon Request
(contact: mdy89@snu.ac.kr)

Total workflow







SpliceHetro: a novel computational method to measure spliceome level intra-tumor heterogeneity (sITH) in breast cancer


Motivation: Measuring intra-tumor heterogeneity (ITH) is important for clinical applications. Diversity of cells can be measured effectively in terms of genomic sequence variations, thus computational methods for predicting ITH at the DNA level, a.k.a, gITH, have been successfully developed over the years. Recent studies show that ITH at different levels is also important. Accordingly, new ITH concepts such as tITH at the RNA level and mITH at the methylation level have been introduced. Recently, it has been reported that spliceome ITH (sITH) exists and is important for clinical applications. However, there is no computational method to define and measure sITH.

Results: We propose a novel computational method, SpliceHetro, that defines and measures sITH. Since sITH should be defined at the gene level, we used 4,306 genes in the cancer hall mark gene set. First, to compensate for differences in gene expression levels, we used a sampling approach to pool the same number of reads for each of the genes. The reference isoform templates were computed by running Cufflinks on all available TCGA breast cancer sequencing data. The tumor sample level isoform quantity distribution was computed using the CEM tool. Then, isoform expression quantities were summarized using an information theoretic method for each gene. Our approach was evaluated in terms of breast cancer subtype classification, prognosis analysis, and relations with other ITH metrics. For breast cancer subtype classification, sITH achieved classification performance similar to the original PAM50 classification and higher performance than tITH. In terms of prognosis prediction, sITH outperformed gITH and tITH.




Fig. 3. Survival analysis plot: (a) gITH, (b) tITH, and (c) sITH is the results of Kaplan-Meier plot (KM plot) for sub-group identification by K-means clustering (K = 2) for each profile. The LR results of sITH were found to be significant, with an average p-value of 0.0112. sITH showed a higher level of significance than gITH (p = 0.1918) and tITH (p = 0.0121).