RMSS 2017
Krista Fischer
Personalized Prediction of the Risk of Common Complex Diseases: Some Statistical Aspects
The talk will provide an overview of the development and validation of algorithms for personalized prediction of the risk of Type 2 Diabetes (T2D) in the Estonian Biobank cohort. In addition, various methodological challenges at different steps of the process will be discussed.
The results of a large-scale meta-analysis of Genome-Wide Association Studies (GWAS) could be used to order the Single Nucleotide Polymorphisms (SNPs) based on the strength of their established association with the phenotype (indicated by the p-value, for instance). Subsequently, a certain number of top independent SNPs can be combined to form a Genetic Risk Score (GRS) that has considerably better predictive ability than any of the SNPs alone. For an efficient GRS, one needs to identify optimal criteria for selecting SNPs and their corresponding weights. We show that for T2D, the doubly-weighted GRS that combines more than 5000 SNPs provides the strongest association with both prevalent and incident T2D.
To provide accurate estimates of the risk of a complex disease, a GRS needs to be combined with known environmental and lifestyle-related risk factors. We discuss important steps in the development of a prediction model that combines genetic and non-genetic predictors for T2D. To use the model in practical risk assessment, it needs to be combined with estimates of baseline (age-specific) risk level to provide estimates of absolute risk. We will show important stages of the process as well as the resulting risk prediction tool. We also discuss statistical challenges that are related to specific features of the population-based biobank data (left-truncation for some outcomes, mix of retrospective and prospective data for some others, etc.). In addition, some common mistakes and their consequences are pointed out (such as partial overlap of discovery and validation cohorts).