APORC Document Center: MCDGPA

More...

MCDGPA: Modularized Candidate Disease Genes Prioritization Algorithm

Reference#

Chen X, Yan GY, Liao XP (2010) A novel candidate disease genes prioritization method based on module partition and rank fusion. OMICS 4: 337-356.

Method#

The aim of MCDGPA is to predict potential disease-related genes. We first partition the network into several modules and then obtain the ranking of candidate genes in each disease-associated module and finally give a global ranking of candidate genes in the entire network to select the most probable disease gene. ENDEAVOUR, Diffusion Kernel (DK) and Random Walk with Restart (RWR) have shown to be effective in previous research (Aerts et al., 2006; Kohler et al., 2008). Here we put forward three Modularized Candidate Disease Genes Prioritization Algorithms (MCDGPA): Modularized ENDEAVOUR (MENDEAVOUR), Modularized Random Walk with Restart (MRWR) and Modularized Diffusion Kernel (MDK). MCDGPA is composed of three steps: network partition, getting local ranking in each disease-associated module and getting global ranking in the entire network.

Procedure#

topnmrwr.m is used to give the prediction about the top n non-seed genes which are associated with disease by Modularized Random Walk with Restart. It can be run by MATLAB in less than one minute when the method is applied to the prostate and breast cancer network. The input variable "n" means you want to get the top n genes which are predicted to be associated with disease. Variable “r” means the back probability. Before the usage of the program, you should prepare several files: (1) adjacency matrix.txt showing the adjacency matrix in the network; (2) module.txt showing the identifiers of all the genes in each module; (3) seedgeneID.txt showing the identifiers of the seed genes in the network. You can read txt files of the example for reference of the format.

topnmdk.m is used to give the prediction about the top n non-seed genes which are associated with disease by Modularized Diffusion Kernel. It can be run by MATLAB in less than one minute when the method is applied to the prostate and breast cancer network. The input variable "n" means you want to get the top n genes which are predicted to be associated with disease. Before the usage of the program, you should prepare several files: (1) adjacency matrix.txt showing the adjacency matrix in the network; (2) module.txt showing the identifiers of all the genes in each module; (3) seedgeneID.txt showing the identifiers of the seed genes in the network.

MENDEAVOUR can be implemented in the web interface of ENDEAVOUR after the module partition. We have used three module partition methods in the paper: Rosvall’s module partition algorithm, Markov cluster algorithm (MCL), and CFinder. The software about MCL and CFinder can be available online. The code of Rosvall’s module partition algorithm can be obtained by sending email to the authors (Rosvall et al., 2008).

Our modularized disease genes prioritization methods are fit for the disease-specific networks. If there are some problems in the process of usage of the code, please be free to contact me: xingchen@amss.ac.cn.