This code performs longitudinal analysis using both naive and TMLE analyses on the first wave of simulated data created in the data creation code. The code performs a series of analyses:
Firstly, we define SuperLearner libraries to be used by SuperLearner:
SLlib <- c("SL.glm")
SLlib2 <- c("SL.glm","SL.glm.interaction","SL.stepAIC","SL.ranger")
SLlib3 <- list(Q=c("SL.glm","SL.glm.interaction","SL.stepAIC"),
g=c("SL.glm","SL.glm.interaction","SL.stepAIC","SL.ranger"))
Next, we define the models to be used by ‘ltmle’ when manually specifying models. ‘ltmle’ requires models to be defined for each exposure and censoring variable in ‘gform’ and the first in each block of confounders and each outcome in ‘qform’. ‘ltmle’ can also produce a series of q and g models automatically based on the data - if qform and gform are not specified in the command, ‘ltmle’ will produce a set of all required models, using all predictor variables that preceed that variable in the data. In many cases this is perfectly acceptable (in this case, the correctly specified models are the same as the automatically produce models); however, the models produced automatically can be incorrect when variables should not be included in some of the component models - for example, when predictors of censoring are not the same as predictors of exposure.
# Correctly specified confounder/outcome models
qformb <- c(l_0="Q.kplus1 ~ ba + bb + bc",
y_0="Q.kplus1 ~ ba + bb + bc + l_0 + a_0",
l_1="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0",
y_1="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1",
l_2="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1",
y_2="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2",
l_3="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2",
y_3="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3 + a_3",
l_4="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3 + a_3 + y_3",
y_4="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3 + a_3 + y_3 + l_4 + a_4")
# Incorrectly specified confounder/outcome models
mqformb <- c(l_0="Q.kplus1 ~ ba + bb + bc",
y_0="Q.kplus1 ~ ba + bb + bc + a_0",
l_1="Q.kplus1 ~ ba + bb + bc + a_0 + y_0",
y_1="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1",
l_2="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1",
y_2="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2",
l_3="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2",
y_3="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2 + a_3",
l_4="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2 + a_3 + y_3",
y_4="Q.kplus1 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2 + a_3 + y_3 + a_4")
# Correctly specified exposure/censoring models
gformb <- c(c_0="c_0 ~ ba + bb + bc",
a_0="a_0 ~ ba + bb + bc + l_0",
c_1="c_1 ~ ba + bb + bc + l_0 + a_0 + y_0",
a_1="a_1 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1",
c_2="c_2 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1",
a_2="a_2 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2",
c_3="c_3 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2",
a_3="a_3 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3",
c_4="c_4 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3 + a_3 + y_3",
a_4="a_4 ~ ba + bb + bc + l_0 + a_0 + y_0 + l_1 + a_1 + y_1 + l_2 + a_2 + y_2 + l_3 + a_3 + y_3 + l_4")
# Incorrectly specified exposure/censoring models
mgformb <- c(c_0="c_0 ~ ba + bb + bc",
a_0="a_0 ~ ba + bb + bc",
c_1="c_1 ~ ba + bb + bc + a_0 + y_0",
a_1="a_1 ~ ba + bb + bc + a_0 + y_0",
c_2="c_2 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1",
a_2="a_2 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1",
c_3="c_3 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2",
a_3="a_3 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2",
c_4="c_4 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2 + a_3 + y_3",
a_4="a_4 ~ ba + bb + bc + a_0 + y_0 + a_1 + y_1 + a_2 + y_2 + a_3 + y_3")
Now we can begin analysis. Firstly, we run TMLE using the ‘ltmle’ package, but manually specifying the models to be used (as defined above), and with estimation conducted using only generalised linear models. Note that, because ‘ltmle’ checks and transforms continuous outcomes, and checks that data is always missing after censoring. For continuous outcomes, the variable is truncated to a quasibinomial distribution (continous but bounded in 0/1); for missing data, and observations after a censoring event are ignored. Because of these checks, the command potentially produces a number of messages. These are not an issue, and have been left enabled for the first analysis to show what they look like, but have been disabled in subsequent analyses to simplify this markdown document.
# Correctly specified
rltmle2 <- suppressWarnings(ltmle(ldata[,-1],
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes=c("y_0","y_1","y_2","y_3","y_4"),
survivalOutcome=FALSE,
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=qformb,gform=gformb,
estimate.time=FALSE))
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Some Ynodes are not in [0, 1], and Yrange was NULL, so all Y nodes are being
transformed to (Y-min.of.all.Ys)/range.of.all.Ys
summary(rltmle2)
Estimator: tmle
Call:
ltmle(data = ldata[, -1], Anodes = c("a_0", "a_1", "a_2", "a_3",
"a_4"), Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = c("y_0", "y_1", "y_2",
"y_3", "y_4"), survivalOutcome = FALSE, Qform = qformb, gform = gformb,
abar = list(c(1, 1, 1, 1, 1), c(0, 0, 0, 0, 0)), SL.library = SLlib,
estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 3.7704
Estimated Std Err: 0.38452
p-value: <2e-16
95% Conf Interval: (3.0168, 4.5241)
Control Estimate:
Parameter Estimate: 0.71147
Estimated Std Err: 0.16201
p-value: <2e-16
95% Conf Interval: (0.39394, 1.029)
Additive Treatment Effect:
Parameter Estimate: 3.059
Estimated Std Err: 0.4127
p-value: 1.2437e-13
95% Conf Interval: (2.2501, 3.8679)
# Outcome model misspecified
rltmle2m1 <- suppressMessages(suppressWarnings(ltmle(ldata[,-1],
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes=c("y_0","y_1","y_2","y_3","y_4"),
survivalOutcome=FALSE,
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=mqformb,gform=gformb,
estimate.time=FALSE)))
summary(rltmle2m1)
Estimator: tmle
Call:
ltmle(data = ldata[, -1], Anodes = c("a_0", "a_1", "a_2", "a_3",
"a_4"), Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = c("y_0", "y_1", "y_2",
"y_3", "y_4"), survivalOutcome = FALSE, Qform = mqformb,
gform = gformb, abar = list(c(1, 1, 1, 1, 1), c(0, 0, 0,
0, 0)), SL.library = SLlib, estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 4.5913
Estimated Std Err: 0.37322
p-value: <2e-16
95% Conf Interval: (3.8598, 5.3228)
Control Estimate:
Parameter Estimate: 0.81117
Estimated Std Err: 0.18107
p-value: <2e-16
95% Conf Interval: (0.45628, 1.1661)
Additive Treatment Effect:
Parameter Estimate: 3.7802
Estimated Std Err: 0.4142
p-value: <2e-16
95% Conf Interval: (2.9683, 4.592)
# Both models misspecified
rltmle2m2 <- suppressMessages(suppressWarnings(ltmle(ldata[,-1],
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes=c("y_0","y_1","y_2","y_3","y_4"),
survivalOutcome=FALSE,
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=mqformb,gform=mgformb,
estimate.time=FALSE)))
summary(rltmle2m2)
Estimator: tmle
Call:
ltmle(data = ldata[, -1], Anodes = c("a_0", "a_1", "a_2", "a_3",
"a_4"), Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = c("y_0", "y_1", "y_2",
"y_3", "y_4"), survivalOutcome = FALSE, Qform = mqformb,
gform = mgformb, abar = list(c(1, 1, 1, 1, 1), c(0, 0, 0,
0, 0)), SL.library = SLlib, estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 5.2992
Estimated Std Err: 0.42277
p-value: <2e-16
95% Conf Interval: (4.4706, 6.1278)
Control Estimate:
Parameter Estimate: 0.024658
Estimated Std Err: 0.13997
p-value: <2e-16
95% Conf Interval: (-0.24969, 0.299)
Additive Treatment Effect:
Parameter Estimate: 5.2745
Estimated Std Err: 0.44486
p-value: <2e-16
95% Conf Interval: (4.4026, 6.1464)
Next, we carry out analysis using SuperLearner, allowing ‘ltmle’ to define the internal models:
slltmle2 <- suppressMessages(suppressWarnings(ltmle(ldata[,-1],
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes=c("y_0","y_1","y_2","y_3","y_4"),
survivalOutcome=FALSE,
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib3,
estimate.time=FALSE)))
Finally, for comparison purposes, we conduct naive analysis. Because this analysis involves multiple observations per id, we conduct the naive analysis using generalised linear mixed models instead of GLMs. Also, unlike analysis with a single outcome, this analysis requires pooling across multiple observations, so some data manipulation is required first.
# Create cumulative exposure and confounder variables for naive analysis
ldata$acum_0 <- ldata$a_0
ldata$acum_1 <- ldata$acum_0 + ldata$a_1
ldata$acum_2 <- ldata$acum_1 + ldata$a_2
ldata$acum_3 <- ldata$acum_2 + ldata$a_3
ldata$acum_4 <- ldata$acum_3 + ldata$a_4
ldata$lcum_0 <- ldata$l_0
ldata$lcum_1 <- ldata$lcum_0 + ldata$l_1
ldata$lcum_2 <- ldata$lcum_1 + ldata$l_2
ldata$lcum_3 <- ldata$lcum_2 + ldata$l_3
ldata$lcum_4 <- ldata$lcum_3 + ldata$l_4
# Create long-form data for GLM analysis
longdata <- reshape(data=ldata,varying=list(c(8,12,16,20,24),c(7,11,15,19,23),c(25:29),c(6,10,14,18,22),c(30:34),c(5,9,13,17,21)),
v.names=c("y","a","acum","l","lcum","c"),
idvar="ID",timevar="obs",direction="long", sep = "_")
# Correctly specified cumulative exposure model using random intercept model
LRI <- lmer("y ~ acum + lcum + ba + bb + bc + (1|ID)",data=longdata)
# Incorrectly specified cumulative exposure model
LRIm <- lmer("y ~ acum + ba + bb + bc + (1|ID)",data=longdata)
Lets see a summary of the results produced by each of the methods, so we can compare them:
lresults2 <- matrix(c(5*coef(summary(LRI))[2,1],
5*coef(summary(LRI))[2,2],
5*coef(summary(LRIm))[2,1],
5*coef(summary(LRIm))[2,2],
summary(rltmle2)$effect.measures$ATE$estimate,
summary(rltmle2)$effect.measures$ATE$std.dev,
summary(rltmle2m1)$effect.measures$ATE$estimate,
summary(rltmle2m1)$effect.measures$ATE$std.dev,
summary(rltmle2m2)$effect.measures$ATE$estimate,
summary(rltmle2m2)$effect.measures$ATE$std.dev,
summary(slltmle2)$effect.measures$ATE$estimate,
summary(slltmle2)$effect.measures$ATE$std.dev),nrow=6,ncol=2,byrow=TRUE)
rownames(lresults2) <- c("Pooled GLMM - correctly specified","Pooled GLMM - incorrectly specified","'ltmle' package - correctly specified","'ltmle' package - outcome misspecified","'ltmle' package - doubly misspecified","SuperLearner LTMLE")
colnames(lresults2) <- c("Coef","SE")
lresults2
Coef SE
Pooled GLMM - correctly specified 2.011135 0.1587504
Pooled GLMM - incorrectly specified 5.740583 0.1628222
'ltmle' package - correctly specified 3.058976 0.4127035
'ltmle' package - outcome misspecified 3.780152 0.4142040
'ltmle' package - doubly misspecified 5.274519 0.4448588
SuperLearner LTMLE 3.299986 0.2823550