ERFC methods

The Emerging Risk Factors Collaboration is a collation of primary data from over 1 million participants in over 100 prospective studies which focuses on risk markers for major cardiovascular mobidity and mortality. The meta-analysis of individual participant data from multiple prospective epidemiological studies provides scope for detailed investigation of exposure-risk relationships, but involves a number of statistical challenges. Analyses are principally based on Cox proportional hazards regression models stratified by sex, undertaken in each study separately. Estimates of unadjusted and adjusted exposure-risk relationships, and interactions, are combined over studies using random-effects meta-analysis. Methods for assessing the shape of risk-exposure associations and the proportional hazards assumption have been developed or extended. Measurement error in exposures and confounders is addressed through the analysis of repeat exposure measurements to estimate corrected regression coefficients.

Further details regarding the statistical analyses undertaken in the ERFC can be found in:

Statistical methods for the time-to-event analysis of individual participant data from multiple epidemiological studies. Int J Epidemiol. 2010;39:1345-1359. [PubMed]

ERFC methods used in assessment of the predictive ability of risk models and the incremental predictive ability of added risk predictors are detailed in:

Assessing risk prediction models using individual participant data from multiple studies. Am J Epidemiol. 2014 Mar 1;179(5):621-32. [PubMed]

Stata programs:

The Stata programs below are available to install within a Stata session by typing:

net from

net describe <package>

net install <package>

It is also highly recommended that you also get the ancillary stata do-file provided in the –proglinks– package that lists links to other non-official SJ and user-written programs utilised in some of the programs below. These should be installed for the programs to work correctly. The –proglinks– package also contains a second ancillary do-file that gives example stata code used to analyse the examples provided in the ERFC statistical methods paper [Int J Epidemiol 2010], which can also be downloaded  as pdf file from here <erfc_stats_methods_example.pdf>.

Stata programs list


Exploratory analysis

checkdis          Generates plots of distribution of a list of continuous variables, optionally by a grouping variable

checkpr           Generates bar charts for a list of categorical variables, optionally by a grouping variable

overlay            A wrapper program that simplifies the task of overlaying twoway graphs over a third stratifying variable

summstat         Generates overall or pooled within group summary statistics for a specified varlist


Literature-based meta-analysis

riskconv           Converts risk ratios measured on a specified scale to a desired scale

i2ci                  Calculates confidence intervals for I2 and H statistics for measuring heterogeneity


IPD Meta-analysis – cross-sectional correlates

partcorr            Calculates partial correlation coefficients, including by subgroup

cscorr              Fits linear mixed models using data from multiple studies to assess cross-sectional associations with response

cscorrst            Generates detailed descriptive summary tables for data available for cross-sectional correlates analysis


IPD Meta-analysis – aetiological associations

adjmeta           A program for large-scale meta-analysis of an exposure-disease association using individual participant data (IPD)

mvshape          Calculates risk ratios for exposure-disease association using user-specified categories by multivariate meta-analysis

mvmetaipd      Program for multivariate meta-analysis of an exposure-disease association using individual participant data (IPD)

mvmetai2        Calculates heterogeneity statistics after fitting a multivariate meta-analysis model

checkhaz         Generates plots of cumulative hazard and survival functions for multiple groups in time-to-event data

phtest              Implements up to 6 methods for testing proportional hazard (PH) assumption for survival data from multiple studies

stsetage           Sets up survival-time data in a format suitable for estimating age-at-risk specific hazard ratios

stsetcco           Converts case-cohort design dataset into a format suitable for estimating hazard ratios using weighted Cox regression

rdrcalc             Calculates and plots adjusted regression dilution ratios (RDRs) over time


IPD Meta-analysis – risk prediction

predaddc         Meta-analysis of the predictive ability of risk prediction models using C-index measure in individual participant data (IPD)

predaddd         Meta-analysis of the predictive ability of risk prediction models using D measure in individual participant data (IPD)

predmeta         Meta-analysis of the predictive ability of risk prediction models using C-index or D measures (post-estimation)

predsubg         Assessment of subgroup effects in meta-analysis of the predictive ability of risk prediction models using C-index or D

predstat           Assessment of the predictive ability of a risk marker using reclassification statistics in IPD

preddstat         Post-estimation assessment of model predictive ability using D statistic adapted for multiple studies

predsurv           Predicts survival and failure probabilities after a stcox model, including with staggered entry

predcalib         Calibration of survival or failure probabilities predicted from prognostic survival models


Other useful programs

xtilew               Creates within group quantiles using an approach that is efficient for large datasets

farcalc             Calculates standard errors and confidence intervals based on floating variances after fitting a regression model

estplot             Flexible plotting of point estimates and pointwise CIs after modelling

submat            Extracts coefficient estimates and variance covariance matrix corresponding to a specified namelist of effects

corrbeta           Solves for linear regression coefficient given correlation coefficient, sample size, and SDs

inplink             Stata code to input plink format output text files

proglinks          Provides links to other user-written stata packages that some of the programs above might utilise