ICIC2019: Identifying Cancer Biomarkers from High-throughput RNA Sequencing Data by Machine Learning

  • Last updated: May 10, 2019


Welcome to our website! This is the supplementary materials webpage for our paper "Identifying Cancer Biomarkers from High-throughput RNA Sequencing Data by Machine Learning". The source code and data used in this paper can be available here. Pay attention to the users: You can use and redistribute the data and code if you accept GNU General Public License (GPL).

Any questions, please contact me by zpliu AT sibs(dot)ac(dot)cn.

Background: Cancer is a major threat to human life and health. Tens of millions of people are newly diagnosed with cancer and millions of people die from cancer every year. It is in urgent need of early diagnosis and treatment of cancer to reduce cancer mortality. Effective molecular biomarker is one of the efficient way of realizing the early diagnosis. Thus, it is very important to discover biomarkers for different cancers to achieve cancer screening and early detection. In this paper, we explore the cancer biomarkers from TCGA transcriptomic RNA-seq data by machine learning methods.

Results: In this paper, we identified biomarker genes for 12 cancer types from RNA-seq data by feature selection and machine learning. From the differentially expressed genes, we integrated the feature selection and random forest classification to identify the biomarkers. For the identified cancer biomarkers, we evaluated their classification abilities by six machine learning algorithms. ELM is found to the one with the best performance. The high accuracy in the cross-validations provides more evidence for these identified biomarkers in classifying the samples in normal and disease states. The functional enrichment analysis indicate the pathogenesis implications underlying the biomarkers. We also performed the functional enrichment analyses on these selected biomarkers. In conclusion, we provided a computational method of identifying cancer biomarkers from TCGA RNA-seq data.


  • Zishuang Zhang, Zhi-Ping Liu: Identifying Cancer Biomarkers from High-throughput RNA Sequencing Data by Machine Learning. ICIC2019, in press.

Data and code:#

Add new attachment

Only authorized users are allowed to upload new attachments.

List of attachments

Kind Attachment Name Size Version Date Modified Author Change note
ICIC2019.zip 1,416.5 kB 1 16-May-2019 13:48 ZhipingLiu
« This page (revision-6) was last changed on 16-May-2019 13:49 by ZhipingLiu