Meta-analysis and Meta-regression with R

Posted on Tue 13 October 2020 in R

Introduction

On December 2019, reports from severe acute respiratory syndrome in Wuhan, China, were linked to a novel coronavirus, now known as SARS-CoV-2, and the disease it causes was termed coronavirus disease 2019 (COVID-19).

The World Health Organization declared the COVID-19 outbreak a Public Health Emergency of International Concern on 30 January 2020, and a pandemic on 11 March 2020.

SARS-CoV-2 causes an interstitial and alveolar pneumonia, but in nearly 26% of patients it can lead to a severe disease, with multiple organ failure (references: 1, 2, 3, 4).

Thus, infection by SARS-CoV-2 can cause kidney injury, perhaps by direct impact of virulence, or via other renal insults such as volume depletion, hypoxia and cytokine storm. This has posed pressure to healthcare systems due to a shortage of dialysis staff, equipment and consumables throughout the world.

I recently co-authored an article regarding acute kidney injury (AKI) among patients infected with COVID-19 patients. The objective of the study was to systematically review the incidence of COVID-19-associated AKI, its related risks factors, need for renal replacement therapy (RRT) among patients with AKI and mortality according to current illness severity.

In this post I will show the code I used to meta-analyze the AKI incidence data among COVID-19 patients. As usual, the codes presented here are available on the corresponding folder on my GitHub portfolio.

Methods

Systematic Review methodology

My colleagues systematically searched PubMed, SCOPUS and Web of Science databases with the search terms “COVID-19” OR “SARS- CoV-2” OR “Coronavirus 2019” OR “2019- nCoV” AND “Acute Kidney Injury” OR “Kidney” OR “Nephrology” OR “Renal Disease” OR “Clinical Characteristics” OR “Clinical Features”.

Meta-analysis and meta-regression

The heterogeneity between studies sample sizes was expressed through I² measure and τ² statistic (estimated via maximum- likelihood). A Cochran”s Q test with k–1 df (in which k is the number of studies included) and with significance level α=0.10 (exclusively for this test) was conducted to assess if heterogeneity was significantly different from zero. If so, the heterogeneity was classified according to the observed I² measure: ≤25%, between 25% and 50%, between 50% and 75% and between 75% and 100% were considered as low, moderate, high and very high heterogeneity, respectively. In case of significant heterogeneity, the meta-analyses were conducted assuming a random- effects model with the Hartung-Knapp adjustment (References: 6, 7).

The independent variables included in a multivariate meta-regression were: study location (China, Poland or the USA), study design (case series, prospective or retrospective cohorts), study setting (single or multicenter), AKI definition criteria, median age of patients in years, proportion of males patients in the sample size and proportion of patients with certain comorbidities, namely hypertension, other cardio-vascular diseases, diabetes, chronic pulmonary disease, chronic kidney disease (CKD) and cancer.

Results

Results of the systematic review

My colleagues found 21 studies and I was responsible for perform the meta-analysis. Overall, 15,536 patients with COVID-19 was the total sample size. Most studies (18) were from China, two were from the USA and one from Poland.

Incidence of AKI among patients with COVID-19

The meta-analysis showed an incidence of AKI associated with COVID-19 of 12.3% (95% CI=7.3% to 20.0%; Cochran’s Q=839.6 with 20 degrees of freedom, p<0.001; I² = 97.6% and τ² = 1.42).

Below I show the code that generated this result.

Loading packages

I used R software with RStudio in Windows 10 to perform the meta-analysis (the same steps can be followed in Unix systems). I opened a R session through RStudio, created a project, named it aki_demo and loaded all packages needed for the meta-analysis. The command to install them is:

# Run only once, if you do not have the packages installed yet
install.packages(c("meta", "here", "openxlsx"))

The packages and their dependencies will be downloaded from the internet and installed. After install, I must load them in the R session:

library(meta)
library(here)
library(openxlsx)

Load and inspect data

Now I can load the data. I prepared an Excel spreadsheet with an excerpt of the data of our paper and put inside a subfolder named data on my working directory. The current script I saved on a folder named src (for “source”). My project folder now look like this:

.
├── data
│   └── aki_demo.xlsx
├── output
├── src
│   └── aki_demo.R
├── .Rhistory
└── aki_demo.RProj

I load the spreadsheet into the R session memory with the command:

aki <- read.xlsx(here("data","aki_demo.xlsx"))

The <-, which is simply the “less than” < symbol followed by a dash symbol -, means “assigning”. I am assigning the data to the object named aki. The read.xlsx() function from openxlsx package reads the content of the spreadsheet and loads into a data frame, a tabular R format that holds data.

Note the usage of the here() function. here is a nice R package that helps file paths management. It ensures that my code will run smoothly on any computer, provided the directory structure is maintained. Thus, there is no need to hardcode file paths, improving code portability/reproducibility. The Dr. Jennifer Bryan’s post is a nice introduction to the here package.

To see the contents of the spreadsheet, type aki in the RStudio console window (bottom left area of RStudio window) and press Enter/Return or select the or put the cursor at the start of the line in the script window (top left) and press CTRL + Enter (⌘ Command + Enter on MacOS).

RStudio Script and Console windows

The first five columns of the output are below (I manually omitted the remaining of the table):

> aki
                          authors location sample_size control_aki case_aki
1           Tao Chen et al., 2020    China         274         245       29
2            W. Guan et al., 2020    China        1099        1093        6
3        Xiaobo Yang et al., 2020    China          52          37       15
4           Fei Zhou et al., 2020    China         191         163       28
5        Yingzhen Du et al., 2020    China          85          37       48
6      Chaolin Huang et al., 2020    China          41          38        3
7       Qingxian Cai et al., 2020    China         298         281       17
8       Yichun Cheng et al., 2020    China         701         665       36
9       Shaoqing Lei et al., 2020    China          34          32        2
10        Dawei Wang et al., 2020    China         138         133        5
11         Lang Wang et al., 2020    China         339         312       27
12   Jamie S. Hirsch et al., 2020      USA        5449        3456     1993
13      Blazej Nowak et al., 2020   Poland         169         152       17
14          Yi Zheng et al., 2020    China          34          27        7
15         Dawei Wang et al.,2020    China         107          93       14
16           Yuan Yu et al., 2020    China         226         169       57
17 Safiya Richardson et al., 2020      USA        5700        4330     1370
18    Guangchang Pei et al., 2020    China         333         298       35
19         Yanlei Li et al., 2020    China          54          25       29
20      Qingchun Yao et al., 2020    China         108          92       16
21     Chengfeng Qiu et al., 2020    China         104         102        2

Check that are 21 studies (rows) and the first five columns are: authors, location, sample_size (the number of COVID-19 patients), control_aki (the number of COVID-19 patients without AKI) and case_aki (the number of AKI cases among the COVID-19 patients).

To check the name of the other columns, type:

names(aki)

# Output:
 [1] "authors"        "location"       "sample_size"    "control_aki"   
 [5] "case_aki"       "age"            "design"         "setting"       
 [9] "aki_criteria"   "males_p"        "hypertension_p" "all_cardio_p"  
[13] "diabetes_p"     "copd_p"         "ckd_p"          "cancer_p"

Then, it is possible to see that there are 16 columns in total. The bracketed number in the left side of the line is the index of the first element in the line (the count starts with [1] authors; the fourth element is [5] case_aki column, and so on).

To check the shape (dimensions) of the table I type:

dim(aki)

# Output:
[1] 21 16

The output tells that there are 21 rows and 16 columns indeed.

Calculating meta-analysis and inspecting results

Now I will use the metaprop() function from meta package to calculate the pooled incidence (a proportion) of AKI among COVID-19 patients and assign the results to aki_incidence_meta object:

aki_incidence_meta <- metaprop(event = case_aki, # numerator
                   n = sample_size, # denominator
                   studlab = paste0(authors,", ", location), # makes label by joining authors and location
                   data = aki, # dataframe name
                   sm = "PLO", # pooling calculation method
                   predict = TRUE, # provides prediction confidence interval
                   hakn = TRUE, # Hartung-Knapp correction
                   comb.fixed = TRUE, # displays fixed-effect results
                   comb.random = TRUE, # displays random-effect results
                   level.comb = 0.95, # confidence interval
                   method.tau = "ML", # heterogeneity calculation method
                   method.bias = "linreg",
                   warn = TRUE
)

Note that I type the command in this “list” format to allow easier reading of the code, it is not mandatory. The event argument is the numerator (thus aki_case column) and n is the denominator (sample_size column). I summarized above what the principal arguments do. Check metaprop()‘s documentation page here for more details. The meta package has other functions to perform meta-analysis with other kind of data. Check the documentation here.

Retrieving a summary of the aki_incidence_meta object produces this output:

# Command
summary(aki_incidence_meta)

# Output:
Number of studies combined: k = 21

                     proportion           95%-CI
Fixed effect model       0.2801 [0.2724; 0.2879]
Random effects model     0.1227 [0.0725; 0.2003]
Prediction interval             [0.0107; 0.6451]

Quantifying heterogeneity:
 tau^2 = 1.4231 [0.8378; 3.3344]; tau = 1.1929 [0.9153; 1.8260];
 I^2 = 97.6% [97.1%; 98.1%]; H = 6.48 [5.83; 7.20]

Test of heterogeneity:
      Q d.f.  p-value
 839.62   20 < 0.0001

Details on meta-analytical method:
- Inverse variance method
- Maximum-likelihood estimator for tau^2
- Q-profile method for confidence interval of tau^2 and tau
- Hartung-Knapp adjustment for random effects model
- Logit transformation

Since there was high heterogeneity, I chose to interpret the results through a random-effects model, yielding the numbers I quoted above. Below the results are the heterogeneity reports, results of Cochran’s Q test statistic, degrees-of-freedom and p-value confirming that the heterogeneity is not negligible and a summary of statistical methods used in the meta-analysis.

If you want to save this output directly to file, you could use the sink() method:

sink(here("output", "aki_incidence.txt"))
summary(aki_incidence_meta)
sink()

It will save the raw output in text format in a file named aki_incidence.txt into the output folder.

Producing a Forest plot

Now I will generate a Forest plot with forest() function (also from meta package) to visually represent the meta-analysis results and save it to a TIFF file. The commands below open the TIFF graphical device, produce the plot and then closes the device:

# Open the image device
tiff(filename = here("output", "plots", "aki_incidence.tiff"),
     compression = "lzw",
     res = 300,
     width = 3050,
     height = 1750)

# Produce the plot
forest(aki_incidence_meta,
             xlim = c(0, 1), # set axis limits
             comb.fixed =  FALSE, # omit fixed-effect model results
             leftlabs = c("Study, Location", "AKI cases", "Sample size"), # set left-side column names
             rightlabs = c("AKI incidence", "95% CI", "Weight"), # set right-side column names
             pooled.events = TRUE,
             col.predict = "black")

# Close the device and save the plot
dev.off()

Check other R graphical devices commonly used to save plots as images here.

Calculating a meta-regression

To calculate the meta-analysis, I needed just the sample size and number of events. I can use the remaining variables in the data frame to calculate a meta-regression to assess if aggregate measures of patients characteristics would be associated with increased risk of AKI occurrence. Below is the command I used:

aki_incidence_metareg <- metareg(aki_incidence_meta, ~ design + setting + location + age + males_p + aki_criteria + hypertension_p + all_cardio_p + diabetes_p + copd_p + ckd_p + cancer_p)

I used the aki_incidence_meta meta-analysis object as input to the metareg function. After a comma, I write a tilde ~ and the list all covariates separated by plus signs +. In R notation, the tilde separates the dependent covariate from the independent variables. Remember they are all columns from the spreadsheet I imported in the beginning. I check the results using the summary() function:

summary(aki_incidence_metareg)

# Output:
Mixed-Effects Model (k = 21; tau^2 estimator: ML)

  logLik  deviance       AIC       BIC      AICc 
-20.3487   62.5815   76.6975   95.4989  418.6975   

tau^2 (estimated amount of residual heterogeneity):     0.2864 (SE = 0.1130)
tau (square root of estimated tau^2 value):             0.5352
I^2 (residual heterogeneity / unaccounted variability): 77.00%
H^2 (unaccounted variability / sampling variability):   4.35
R^2 (amount of heterogeneity accounted for):            79.88%

Test for Residual Heterogeneity:
QE(df = 4) = 90.3774, p-val < .0001

Test of Moderators (coefficients 2:17):
F(df1 = 16, df2 = 4) = 0.6824, p-val = 0.7411

Model Results:

                                     estimate       se     tval    pval      ci.lb     ci.ub 
intrcpt                               -7.1298   7.4538  -0.9565  0.3930   -27.8250   13.5654    
designprospective                      3.9234   4.6568   0.8425  0.4469    -9.0058   16.8527    
designretrospective                    4.8177   4.3793   1.1001  0.3330    -7.3411   16.9764    
settingsingle_center                  -0.0597   1.0959  -0.0544  0.9592    -3.1023    2.9830    
locationPoland                        -0.7740   1.7608  -0.4396  0.6829    -5.6628    4.1147    
locationUSA                            1.5444   2.4229   0.6374  0.5585    -5.1826    8.2714    
age                                    0.1119   0.1032   1.0844  0.3392    -0.1747    0.3985    
males_p                               -5.7066   8.9694  -0.6362  0.5592   -30.6097   19.1964    
aki_criteriaKDIGO                     -3.4725   2.6281  -1.3213  0.2569   -10.7691    3.8242    
aki_criteriaKDIGO Expanded Criteria   -5.6275   4.2522  -1.3235  0.2563   -17.4334    6.1783    
aki_criteriaSerum creatinine (Scr)    -1.5829   3.9282  -0.4030  0.7076   -12.4894    9.3236    
hypertension_p                       -12.7624  10.2703  -1.2426  0.2819   -41.2774   15.7526    
all_cardio_p                         -12.5110  25.5305  -0.4900  0.6498   -83.3950   58.3729    
diabetes_p                            29.8351  25.7688   1.1578  0.3114   -41.7105  101.3807    
copd_p                                 7.8958  38.3726   0.2058  0.8470   -98.6438  114.4353    
ckd_p                                 77.4441  78.1549   0.9909  0.3778  -139.5486  294.4367    
cancer_p                              -5.5262  20.2780  -0.2725  0.7987   -61.8269   50.7746    

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The estimate column is the magnitude of the effect of the variable upon the dependent variable (AKI incidence among COVID-19 patients). The bigger this number, the higher is the influence. Positive numbers mean higher risk. Negative numbers mean less risk. However, we must check the tval and its corresponding pval (p-value) columns. The tval test assumes a null hypothesis that estimate = 0. Therefore if pval is less than a pre-specified level of confidence (say, 5%), we can assume the estimate is significantly different than zero. Thus, we would assume that the variable would have influence over the outcome.

However, as you can see in the output, no variables were associated with AKI incidence with statistical significance.

Conclusion

With this I conclude this demonstration. To summarize I:

Demonstrated how to import data into a R session;
Introduced the meta package, a widely-used R package to meta-analysis calculation;
Demonstrated the rationale, execution and interpretation of meta-regression.

Check my published paper to see the results from a complete analysis of the systematically-reviwed data here.

Subscribe to my RSS feed, Atom feed or Telegram channel to keep you updated whenever I post new content.