Stop Meta-Analyzing, Start Instrumenting: Maximizing the Predictive Power of Polygenic Scores

Download Publication (PDF)

Polygenic scores have become the workhorse for empirical analyses in social-science genetics. Because a polygenic score is constructed using the results of finite-sample Genome-Wide Association Studies (GWASs), it is a noisy approximation of the true latent genetic predisposition to a certain trait. The conventional way of boosting the predictive power of polygenic scores is to increase the GWAS sample size by meta-analyzing GWAS results of multiple cohorts. In this paper we challenge this convention. Through simulations, we show that Instrumental Variable (IV) regression using two polygenic scores from independent GWAS samples outperforms the typical Ordinary Least Squares (OLS) model employing a single meta-analysis based polygenic score in terms of bias, root mean squared error, and statistical power. We verify the empirical validity of these simulations by predicting educational attainment (EA) and height in a sample of siblings from the UK Biobank. We show that IV regression between-families approaches the SNP-based heritabilities, while compared to meta-analysis applying IV regression within-families provides a tighter lower bound on the direct genetic effect. IV estimation improves the predictive power of polygenic scores by 12% (height) to 22% (EA). Our findings suggest that measurement error is a key explanation for hidden heritability (i.e., the difference between SNP-based and GWAS-based heritability), and that it can be overcome using IV regression.