Integrative analysis of multiple omics datasets using O2PLS based methods

Nowadays, multiple omics data are measured on the same samples in the belief that these different omics datasets represent various aspects of the underlying biological systems. Integrating these omics datasets will facilitate the understanding of the systems. In this talk, we consider approaches to incorporate external knowledge and statistical inference in the data integration framework. In the first part of the talk, we present our recently developed method GO2PLS, which performs dimension reduction and constructs a few latent components representing the relationship between omics data. It incorporates external biological information on groups of features and performs feature selection to obtain more interpretable results. In the second part, we propose a probabilistic approach, supervised PO2PLS to model the outcome and the omics data jointly. With supervised PO2PLS, we perform statistical inference on the relationship between the omics data and the outcome.

We illustrate these methods on omics datasets from three distinct studies. First, GO2PLS is applied to methylation (p=450k) and glycomics (q=22) data on 405 participants from TwinsUK, which is a large cohort study. Then we apply GO2PLS to a small case-control study (N=23) about HCM, where regulomics (p=33k) and transcriptomics (q=15k) data are available. Lastly, we apply supervised PO2PLS to integrate methylation (p=450k) and glycomics (q=10) data on 85 subjects from a family-based study of DS.