skip to content

Department of Public Health and Primary Care (PHPC)

 

Links on this page require updating. Please contact Dr Stephen Kaptoge with any enquiries.

 

Stata programs:

The Stata programs below are available to install within a Stata session by typing:

 

net from http://ceu.phpc.cam.ac.uk/software/erfc/

net describe <package>

net install <package>

 

or more simply at once as:

 

net install <package>, from(http://ceu.phpc.cam.ac.uk/software/erfc/) replace

 

It is also highly recommended that you also get the ancillary stata do-file provided in the –proglinks– package that lists links to other non-official SJ and user-written programs utilised in some of the programs below. These should be installed for the programs to work correctly. The –proglinks– package also contains a second ancillary do-file that gives example stata code used to analyse the examples provided in the ERFC statistical methods paper [Int J Epidemiol 2010], which can also be downloaded  as pdf file from here <erfc_stats_methods_example.pdf>.

 

Stata programs list

 

Exploratory analysis

checkdis           Generates plots of distribution of a list of continuous variables, optionally by a grouping variable

checkpr            Generates bar charts for a list of categorical variables, optionally by a grouping variable

overlay             A wrapper program that simplifies the task of overlaying twoway graphs over a third stratifying variable

summstat         Generates overall or pooled within group summary statistics for a specified varlist

 

Literature-based meta-analysis

riskconv           Converts risk ratios measured on a specified scale to a desired scale

i2ci                   Calculates confidence intervals for I2 and H statistics for measuring heterogeneity

 

IPD Meta-analysis – cross-sectional correlates

partcorr            Calculates partial correlation coefficients, including by subgroup

cscorr               Fits linear mixed models using data from multiple studies to assess cross-sectional associations with response

cscorrst            Generates detailed descriptive summary tables for data available for cross-sectional correlates analysis

 

IPD Meta-analysis – aetiological associations

adjmeta            A program for large-scale meta-analysis of an exposure-disease association using individual participant data (IPD)

mvshape          Calculates risk ratios for exposure-disease association using user-specified categories by multivariate meta-analysis

mvmetaipd       Program for multivariate meta-analysis of an exposure-disease association using individual participant data (IPD)

mvmetai2          Calculates heterogeneity statistics after fitting a multivariate meta-analysis model

fpshape            Program for meta-analysis of non-linear exposure-outcome associations using fractional polynomials (FPs)

checkhaz          Generates plots of cumulative hazard and survival functions for multiple groups in time-to-event data

phtest               Implements up to 6 methods for testing proportional hazard (PH) assumption for survival data from multiple studies

stsetage           Sets up survival-time data in a format suitable for estimating age-at-risk specific hazard ratios

stsetcco           Converts case-cohort design dataset into a format suitable for estimating hazard ratios using weighted Cox regression

rdrcalc              Calculates and plots adjusted regression dilution ratios (RDRs) over time

genusual           Generates usual levels for an exposure variable and confounders

 

IPD Meta-analysis – risk prediction

predaddc          Meta-analysis of the predictive ability of risk prediction models using C-index measure in individual participant data (IPD)

predaddd         Meta-analysis of the predictive ability of risk prediction models using D measure in individual participant data (IPD)

predmeta          Meta-analysis of the predictive ability of risk prediction models using C-index or D measures (post-estimation)

predsubg          Assessment of subgroup effects in meta-analysis of the predictive ability of risk prediction models using C-index or D

predstat            Assessment of the predictive ability of a risk marker using reclassification statistics in IPD

preddstat          Post-estimation assessment of model predictive ability using D statistic adapted for multiple studies

predsurv           Predicts survival and failure probabilities after a stcox model, including with staggered entry

predcalib          Calibration of survival or failure probabilities predicted from prognostic survival models

 

Risk prediction algorithms (see notes below*)

whocvdrisk       Calculates WHO 10-year CVD risk with recalibration to relevant country or GBD region

score2risk         Calculates SCORE2 10-year CVD risk with recalibration to relevant European risk region

 

 

Other useful programs

xtilew                Creates within group quantiles using an approach that is efficient for large datasets

farcalc              Calculates standard errors and confidence intervals based on floating variances after fitting a regression model

estplot              Flexible plotting of point estimates and pointwise CIs after modelling

submat             Extracts coefficient estimates and variance covariance matrix corresponding to a specified namelist of effects

corrbeta            Solves for linear regression coefficient given correlation coefficient, sample size, and SDs

inplink              Stata code to input plink format output text files

proglinks          Provides links to other user-written stata packages that some of the programs above might utilise

 

* Notes on risk prediction algorithms

Please note that the WHO and SCORE2 CVD risk prediction algorithms were derived using cohort data and then recalibrated to relevant risk regions (21 global regions for the WHO algorithm and 4 risk regions in Europe for SCORE2 algorithm). Recalibration was completed using contemporary population data/incidence estimates which aim to represent CVD incidence expected in the broad target population for screening. This avoids the risk predictions representing the absolute risk that is seen in cohort data which can be misleading because cohort studies commonly represent a subset of the population and/or a past period of time. For this reason we advise caution in any analysis that attempts to assess calibration or public health impact of screening by applying the algorithms to observed study-specific data, without any reweighting to make the results applicable to the broader representative contemporary population.

Related Tools and Resources

Additional tools and resources are listed below.

PhenoScanner

PhenoScanner is a curated database holding publicly available results from large-scale genetic association studies. The motivation for creating this tool is to facilitate “phenome scans”, the cross-referencing of genetic variants with a broad range of phenotypes, to help aid the understanding of disease pathways and biology. The catalogue currently contains nearly 3 billion association results and over 10 million unique single nucleotide polymorphisms (SNPs). It is accompanied by a web-based tool that searches the database for associations with the user-specified SNPs. The tool provides the option of searching for trait associations with proxies of the SNPs of interest, calculated using 1000 Genomes phase 3 and Hapmap2. Importantly, all association results are aligned according to the alleles of each input SNP including any associations with proxy SNPs.

PhenoScanner is available at www.phenoscanner.medschl.cam.ac.uk

Past Projects

In addition to the Unit’s portfolio of active and nascent research studies and consortia, we have a number of studies and consortia that are not currently being actively used for research. If you would like to find out more about these initiatives, click on the links below. Contact details for each study or consortium can be found on these pages.