RMSS 2017
Korbinian Strimmer
A New Look at Omics Data Integration from an Entropy and Network Perspective
​
(joint work with Takoua Jendoubi)
​
Probably the most commonly used approach for joint integrative analysis of omics data is classical canonical correlation analysis (CCA), or a modern variant of it (e.g., sparse CCA). Other popular approaches for data integration include O2PLS, a related projection-based approach developed
for used in chemometrics, or the RV coefficient to measure and dissect total association between groups of genes/metabolites/etc. are now also in widespread use in omics data analysis. Unfortunately, these approaches have a number of crucial drawbacks, including lack of interpretability of the underlying factors, incoherency with standard multivariate regression and
difficulty in application to large-scale data.
Here we present as alternative a simple network-based approach to integrative data analysis that employs relative entropy to characterize the overall association between two (or more) sets of omics data. This approach is natural in the setting of latent-variable multivariate regression and we show that in case of normal variables it enables a canonical decomposition that allows to additionally infer the underlying corresponding association network among the individual constituents. Furthermore, our approach to data integration is computationally inexpensive and hence can be applied to large-dimensional data sets. It can also be easily extended to more than two data sets. We illustrate this approach, which can be interpreted as networked extension of CCA, by analyzing metabolomic and transcriptomic data.