MVDA: Multi-View Data Analysis for Patients Sub-typing

Angela Serra, Francesco Bardozzo
May 16, 2017
2 min read

Many diseases - for example, cancer, neuropsychiatric, and autoimmune disorders - are difficult to treat because of the remarkable degree of variation among affected individuals. Precision medicine tries to solve this problem by individualizing the practice of medicine. It considers individual variability in genes, lifestyle and environment with the goal of predicting disease progression and transitions between disease stages, and targeting the most appropriate medical treatments.

A central role in precision medicine is played by patient sub-typing, that is the task of identifying sub-populations of similar patients that can lead to more accurate diagnostic and treatment strategies. The main idea is to identify groups of samples that share relevant molecular characteristics.

To improve the model accuracy for patient stratification, in addition to gene expression, other omics data type can be used, such as miRNA (microRNA) expression, methylation or copy number alterations. Data integration approaches to efficiently identify sub-types among existing samples has recently gained attention.

In this project the MVDA multi-view technique for patient subtyping will be illustrated and tested on real genomics multi-view datasets (gene expression, microRNA expression, RNASeq, miRNASeq, protein expression, copy number variation (CNV) and clinical data) downloaded from The Cancer Genome Atlas (TCGA), the Memoral Sloan-Kettering Cancer Center and from NCBI GEO.

MVDA consists of four main steps (see project figure): the first is the prototype extraction where the features (genes, protein, etc) were clustered in order to reduce the data dimension; the second is prototype ranking where the previous prototypes were ranked based on their class separability scores; the third is a single view patient clustering step on each view; the last one is the integration of the single view clustering results with a matrix factorisation approach.

Then, in this project the students will learn how to use the basic principles of classical and multi-view clustering to analyse omics data. Moreover they will learn how the goodness of clustering can be evaluated with different criteria such as clustering purity with respect to class labels and Normalised Mutual Information (NMI) between clustering assignment and class labels.

RMSS 2017

Projects

MVDA: Multi-View Data Analysis for Patients Sub-typing

Comments

Recent Posts

Simulating Omic Data Structures for Network Analysis

Biological Network-Based Analysis of Genomic Data

Omics in Practice: From Raw Data to Results

MVDA: Multi-View Data Analysis for Patients Sub-typing

Creation and Application of Multi-omics Analysis Pipelines

Cancer Data Analysis Using Network-based Survival Regression Techniques

Integrating Transcriptomic and Codon Usage Data for Better Therapeutic Target Characterisation of Cl