Performance of deep-learning-based approaches to improve polygenic scores

Submitted by sjc313 on Fri, 27/06/2025 - 08:12

Dr Martin Kelemen, Professor Mike Inouye and Professor Adam Butterworth, Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care

Deep-learning approaches have become popular with many successful applications in a variety of fields. Neural-networks achieve their impressive performance by leveraging their ability to exploit complex, non-linear patterns inherent in many prediction problems. Due to the hype surrounding these approaches, there is a widespread expectation that they would become the state-of-the-art method everywhere, including generating polygenic scores with applications in genomic healthcare.

However, it has been an open question how much non-linear effects, gene-gene (epistasis) or gene-environment interactions, could be exploited to improve genetic prediction of human complex traits. Kelemen and colleagues demonstrate that naive attempts at adapting deep-learning approaches to build polygenic scores are likely to be confounded by the correlation structure of the genome, (linkage disequilibrium) and ungenotyped causal variants. Not accounting for this phenomenon would then lead to the erroneous interpretation that nonlinear effects would play an important role in genetic prediction.

Through simulation and 28 real phenotype analyses in the UK Biobank, the authors find that neural-networks are outperformed by additive models, with only limited evidence for genuine epistatic effects.

In conclusion, for genetic prediction, more scalable, linear approaches may remain the preferable approach for now and expectations placed on deep-learning need to be carefully managed.

Study at Cambridge

About the University

Research at Cambridge