Links on this page require updating. Please contact Dr Stephen Kaptoge with any enquiries.
Stata programs:
The Stata programs below are available to install within a Stata session by typing:
net from http://ceu.phpc.cam.ac.uk/software/erfc/
net describe <package>
net install <package>
or more simply at once as:
net install <package>, from(http://ceu.phpc.cam.ac.uk/software/erfc/) replace
It is also highly recommended that you also get the ancillary stata do-file provided in the –proglinks– package that lists links to other non-official SJ and user-written programs utilised in some of the programs below. These should be installed for the programs to work correctly. The –proglinks– package also contains a second ancillary do-file that gives example stata code used to analyse the examples provided in the ERFC statistical methods paper [Int J Epidemiol 2010], which can also be downloaded as pdf file from here <erfc_stats_methods_example.pdf>.
Stata programs list
Exploratory analysis
checkdis Generates plots of distribution of a list of continuous variables, optionally by a grouping variable
checkpr Generates bar charts for a list of categorical variables, optionally by a grouping variable
overlay A wrapper program that simplifies the task of overlaying twoway graphs over a third stratifying variable
summstat Generates overall or pooled within group summary statistics for a specified varlist
Literature-based meta-analysis
riskconv Converts risk ratios measured on a specified scale to a desired scale
i2ci Calculates confidence intervals for I2 and H statistics for measuring heterogeneity
IPD Meta-analysis – cross-sectional correlates
partcorr Calculates partial correlation coefficients, including by subgroup
cscorr Fits linear mixed models using data from multiple studies to assess cross-sectional associations with response
cscorrst Generates detailed descriptive summary tables for data available for cross-sectional correlates analysis
IPD Meta-analysis – aetiological associations
adjmeta A program for large-scale meta-analysis of an exposure-disease association using individual participant data (IPD)
mvshape Calculates risk ratios for exposure-disease association using user-specified categories by multivariate meta-analysis
mvmetaipd Program for multivariate meta-analysis of an exposure-disease association using individual participant data (IPD)
mvmetai2 Calculates heterogeneity statistics after fitting a multivariate meta-analysis model
fpshape Program for meta-analysis of non-linear exposure-outcome associations using fractional polynomials (FPs)
checkhaz Generates plots of cumulative hazard and survival functions for multiple groups in time-to-event data
phtest Implements up to 6 methods for testing proportional hazard (PH) assumption for survival data from multiple studies
stsetage Sets up survival-time data in a format suitable for estimating age-at-risk specific hazard ratios
stsetcco Converts case-cohort design dataset into a format suitable for estimating hazard ratios using weighted Cox regression
rdrcalc Calculates and plots adjusted regression dilution ratios (RDRs) over time
genusual Generates usual levels for an exposure variable and confounders
IPD Meta-analysis – risk prediction
predaddc Meta-analysis of the predictive ability of risk prediction models using C-index measure in individual participant data (IPD)
predaddd Meta-analysis of the predictive ability of risk prediction models using D measure in individual participant data (IPD)
predmeta Meta-analysis of the predictive ability of risk prediction models using C-index or D measures (post-estimation)
predsubg Assessment of subgroup effects in meta-analysis of the predictive ability of risk prediction models using C-index or D
predstat Assessment of the predictive ability of a risk marker using reclassification statistics in IPD
preddstat Post-estimation assessment of model predictive ability using D statistic adapted for multiple studies
predsurv Predicts survival and failure probabilities after a stcox model, including with staggered entry
predcalib Calibration of survival or failure probabilities predicted from prognostic survival models
Risk prediction algorithms (see notes below*)
whocvdrisk Calculates WHO 10-year CVD risk with recalibration to relevant country or GBD region
score2risk Calculates SCORE2 10-year CVD risk with recalibration to relevant European risk region
Other useful programs
xtilew Creates within group quantiles using an approach that is efficient for large datasets
farcalc Calculates standard errors and confidence intervals based on floating variances after fitting a regression model
estplot Flexible plotting of point estimates and pointwise CIs after modelling
submat Extracts coefficient estimates and variance covariance matrix corresponding to a specified namelist of effects
corrbeta Solves for linear regression coefficient given correlation coefficient, sample size, and SDs
inplink Stata code to input plink format output text files
proglinks Provides links to other user-written stata packages that some of the programs above might utilise
* Notes on risk prediction algorithms
Please note that the WHO and SCORE2 CVD risk prediction algorithms were derived using cohort data and then recalibrated to relevant risk regions (21 global regions for the WHO algorithm and 4 risk regions in Europe for SCORE2 algorithm). Recalibration was completed using contemporary population data/incidence estimates which aim to represent CVD incidence expected in the broad target population for screening. This avoids the risk predictions representing the absolute risk that is seen in cohort data which can be misleading because cohort studies commonly represent a subset of the population and/or a past period of time. For this reason we advise caution in any analysis that attempts to assess calibration or public health impact of screening by applying the algorithms to observed study-specific data, without any reweighting to make the results applicable to the broader representative contemporary population.