SomaLogic plasma protein GWAS summary statistics
In 2018, a team at CEU led by Ben Sun and Adam Butterworth published a paper in Nature describing a genomewide association study in 3301 participants from the INTERVAL study in whom a SomaLogic aptamer-based plasma protein assay had been run, measuring ~3600 proteins (see Sun et al., Nature, 2018). In total, we identified 1,927 associations (“pQTLs”) with 1,478 proteins, greatly enhancing our understanding of the genetic determinants of human plasma protein levels.
In addition to making the individual-level genetic and proteomic data available on request via the European Genome-Phenome Archive (https://ega-archive.org/studies/EGAS00001002555), we are also making publically available the full genetic association summary statistics.
Files can be downloaded from Box.com: https://app.box.com/s/u3flbp13zjydegrxjb2uepagp1vb6bj2
Users can either download files from the website directly, or alternatively connect to the site using rclone or lftp. For instructions on how to use rclone or LFTP, please contact Adam Butterworth (asb38@medschl.cam.ac.uk).
There are 3,283 folders on the site, one for each of the 3,283 SOMAmers used to assay the 2,995 proteins that passed quality control in the analyses. Each folder, which is named according to the ID of the SOMAmer, contains 22 gzipped text files, one for each autosomal chromosome analysed. (The same set of variants was analysed for each of the 3,283 SOMAmers).
Within each results file, the columns are in the order:
VARIANT_ID chromosome position Allele1 Allele2 Effect StdErr log(P)
where the columns include:
VARIANT_ID = VARIANT_ID in the format <chromosome>_<position>_<index>
chromosome and position in build37.
Effect size with respect to Allele1 (the effect allele).
StdErr = Standard Error of effect size
log(P) = log10(p-value)
The Box.com site also contains a .csv file that lists the SOMAmer ID, target protein, full name of the target protein, and Uniprot ID for the 3,283 SOMAmers.