This code performs longitudinal analysis using both naive and TMLE analyses on the first wave of simulated data created in the data creation code. The code performs a series of analyses:
Firstly, we define SuperLearner libraries to be used by SuperLearner:
SLlib <- c("SL.glm")
SLlib2 <- c("SL.glm","SL.glm.interaction","SL.stepAIC","SL.ranger")
SLlib3 <- list(Q=c("SL.glm","SL.glm.interaction","SL.stepAIC"),
g=c("SL.glm","SL.glm.interaction","SL.stepAIC","SL.ranger"))
Next, we define the models to be used by ‘ltmle’ when manually specifying models. ‘ltmle’ requires models to be defined for each exposure and censoring variable in ‘gform’ and the first in each block of confounders and each outcome in ‘qform’. In this case, there is only one outcome, so qform only contains one outcome model. ‘ltmle’ can also produce a series of q and g models automatically based on the data - if qform and gform are not specified in the command, ‘ltmle’ will produce a set of all required models, using all predictor variables that preceed that variable in the data. In many cases this is perfectly acceptable (in this case, the correctly specified models are the same as the automatically produce models); however, the models produced automatically can be incorrect when variables should not be included in some of the component models - for example, when predictors of censoring are not the same as predictors of exposure.
# Correctly specified confounder/outcome models
qforma <- c(l_0="Q.kplus1 ~ ba + bb + bc",
l_1="Q.kplus1 ~ ba + bb + bc + l_0 + a_0",
l_2="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1",
l_3="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2",
l_4="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2 + l_3 + a_3",
y_4="Q.kplus1 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2 + l_3 + a_3 + l_4 + a_4")
# Incorrectly specified confounder/outcome models
mqforma <- c(l_0="Q.kplus1 ~ ba + bb + bc",
l_1="Q.kplus1 ~ ba + bb + bc + a_0",
l_2="Q.kplus1 ~ ba + bb + bc + a_0 + a_1",
l_3="Q.kplus1 ~ ba + bb + bc + a_0 + a_1 + a_2",
l_4="Q.kplus1 ~ ba + bb + bc + a_0 + a_1 + a_2 + a_3",
y_4="Q.kplus1 ~ ba + bb + bc + a_0 + a_1 + a_2 + a_3 + a_4")
# Correctly specified exposure/censoring models
gforma <- c(c_0="c_0 ~ ba + bb + bc",
a_0="a_0 ~ ba + bb + bc + l_0",
c_1="c_1 ~ ba + bb + bc + l_0 + a_0 ",
a_1="a_1 ~ ba + bb + bc + l_0 + a_0 + l_1",
c_2="c_2 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1",
a_2="a_2 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2",
c_3="c_3 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2",
a_3="a_3 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2 + l_3",
c_4="c_4 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2 + l_3 + a_3",
a_4="a_4 ~ ba + bb + bc + l_0 + a_0 + l_1 + a_1 + l_2 + a_2 + l_3 + a_3 + l_4")
# Incorrectly specified exposure/censoring models
mgforma <- c(c_0="c_0 ~ ba + bb + bc",
a_0="a_0 ~ ba + bb + bc",
c_1="c_1 ~ ba + bb + bc + a_0",
a_1="a_1 ~ ba + bb + bc + a_0",
c_2="c_2 ~ ba + bb + bc + a_0 + a_1",
a_2="a_2 ~ ba + bb + bc + a_0 + a_1",
c_3="c_3 ~ ba + bb + bc + a_0 + a_1 + a_2",
a_3="a_3 ~ ba + bb + bc + a_0 + a_1 + a_2",
c_4="c_4 ~ ba + bb + bc + a_0 + a_1 + a_2 + a_3",
a_4="a_4 ~ ba + bb + bc + a_0 + a_1 + a_2 + a_3")
Now, we create a data subset with all observations of exposure and confounders, but only final outcome Y4:
ldata2 <- ldata[,c(-1,-8,-12,-16,-20)]
Now we can begin analysis. Firstly, we run TMLE using the ‘ltmle’ package, but manually specifying the models to be used (as defined above), and with estimation conducted using only generalised linear models. Note that, because ‘ltmle’ checks and transforms continuous outcomes, and checks that data is always missing after censoring. For continuous outcomes, the variable is truncated to a quasibinomial distribution (continous but bounded in 0/1); for missing data, and observations after a censoring event are ignored. Because of these checks, the command potentially produces a number of messages. These are not an issue, and have been left enabled for the first analysis to show what they look like, but have been disabled in subsequent analyses to simplify this markdown document.
# Correctly specified
# Correctly specified
rltmle1 <- suppressWarnings(ltmle(ldata2,
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes="y_4",
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=qforma,gform=gforma,
estimate.time=FALSE,
survivalOutcome=FALSE))
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Note: for internal purposes, all nodes after a censoring event are set to NA and
all nodes (except Ynodes) are set to NA after Y=1 if survivalFunction is TRUE (or if specified by deterministic.Q.function).
Your data did not conform and has been adjusted. This may be relevant if you are
writing your own deterministic function(s) or debugging ltmle.
Some Ynodes are not in [0, 1], and Yrange was NULL, so all Y nodes are being
transformed to (Y-min.of.all.Ys)/range.of.all.Ys
summary(rltmle1)
Estimator: tmle
Call:
ltmle(data = ldata2, Anodes = c("a_0", "a_1", "a_2", "a_3", "a_4"),
Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = "y_4", survivalOutcome = FALSE,
Qform = qforma, gform = gforma, abar = list(c(1, 1, 1, 1,
1), c(0, 0, 0, 0, 0)), SL.library = SLlib, estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 3.8172
Estimated Std Err: 0.39059
p-value: <2e-16
95% Conf Interval: (3.0517, 4.5828)
Control Estimate:
Parameter Estimate: 0.70144
Estimated Std Err: 0.19974
p-value: <2e-16
95% Conf Interval: (0.30997, 1.0929)
Additive Treatment Effect:
Parameter Estimate: 3.1158
Estimated Std Err: 0.4358
p-value: 8.7083e-13
95% Conf Interval: (2.2616, 3.9699)
# Outcome model misspecified
rltmle1m1 <- suppressMessages(suppressWarnings(ltmle(ldata2,
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes="y_4",
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=mqforma,gform=gforma,
estimate.time=FALSE,
survivalOutcome=FALSE)))
summary(rltmle1m1)
Estimator: tmle
Call:
ltmle(data = ldata2, Anodes = c("a_0", "a_1", "a_2", "a_3", "a_4"),
Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = "y_4", survivalOutcome = FALSE,
Qform = mqforma, gform = gforma, abar = list(c(1, 1, 1, 1,
1), c(0, 0, 0, 0, 0)), SL.library = SLlib, estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 4.2545
Estimated Std Err: 0.37362
p-value: <2e-16
95% Conf Interval: (3.5222, 4.9867)
Control Estimate:
Parameter Estimate: 0.8879
Estimated Std Err: 0.28458
p-value: <2e-16
95% Conf Interval: (0.33013, 1.4457)
Additive Treatment Effect:
Parameter Estimate: 3.3666
Estimated Std Err: 0.4693
p-value: 7.3066e-13
95% Conf Interval: (2.4467, 4.2864)
# Both models misspecified
rltmle1m2 <- suppressMessages(suppressWarnings(ltmle(ldata2,
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes="y_4",
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib,
Qform=mqforma,gform=mgforma,
estimate.time=FALSE,
survivalOutcome=FALSE)))
summary(rltmle1m2)
Estimator: tmle
Call:
ltmle(data = ldata2, Anodes = c("a_0", "a_1", "a_2", "a_3", "a_4"),
Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = "y_4", survivalOutcome = FALSE,
Qform = mqforma, gform = mgforma, abar = list(c(1, 1, 1,
1, 1), c(0, 0, 0, 0, 0)), SL.library = SLlib, estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 4.7824
Estimated Std Err: 0.40755
p-value: <2e-16
95% Conf Interval: (3.9836, 5.5812)
Control Estimate:
Parameter Estimate: -0.31222
Estimated Std Err: 0.17067
p-value: <2e-16
95% Conf Interval: (-0.64672, 0.022287)
Additive Treatment Effect:
Parameter Estimate: 5.0946
Estimated Std Err: 0.44143
p-value: <2e-16
95% Conf Interval: (4.2294, 5.9598)
Next, we carry out analysis using SuperLearner, allowing ‘ltmle’ to define the internal models:
slltmle1 <- suppressMessages(suppressWarnings(ltmle(ldata2,
Anodes=c("a_0","a_1","a_2","a_3","a_4"),
Lnodes=c("l_0","l_1","l_2","l_3","l_4"),
Cnodes=c("c_0","c_1","c_2","c_3","c_4"),
Ynodes="y_4",
abar=list(c(1,1,1,1,1),c(0,0,0,0,0)),
SL.library=SLlib3,
estimate.time=FALSE,
survivalOutcome=FALSE)))
summary(slltmle1)
Estimator: tmle
Call:
ltmle(data = ldata2, Anodes = c("a_0", "a_1", "a_2", "a_3", "a_4"),
Cnodes = c("c_0", "c_1", "c_2", "c_3", "c_4"), Lnodes = c("l_0",
"l_1", "l_2", "l_3", "l_4"), Ynodes = "y_4", survivalOutcome = FALSE,
abar = list(c(1, 1, 1, 1, 1), c(0, 0, 0, 0, 0)), SL.library = SLlib3,
estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 3.9612
Estimated Std Err: 0.28563
p-value: <2e-16
95% Conf Interval: (3.4014, 4.5211)
Control Estimate:
Parameter Estimate: 0.72589
Estimated Std Err: 0.17506
p-value: <2e-16
95% Conf Interval: (0.38277, 1.069)
Additive Treatment Effect:
Parameter Estimate: 3.2353
Estimated Std Err: 0.33118
p-value: <2e-16
95% Conf Interval: (2.5862, 3.8844)
Finally, for comparison purposes, we conduct naive analysis using generalised linear models:
# Correctly specified
LGLM <- glm(data=ldata,"y_4 ~ a_0 + a_1 + a_2 + a_3 + a_4 + l_0 + l_1 + l_2 + l_3 + l_4 + ba + bb + bc")
V1<-vcov(LGLM) # Save variance-covariance matrix to calculate joint standard error
# Incorrectly specified
LGLMm <- glm(data=ldata,"y_4 ~ a_0 + a_1 + a_2 + a_3 + a_4 + ba + bb + bc")
V2<-vcov(LGLMm) # Save variance-covariance matrix to calculate joint standard error
Lets see a summary of the results produced by each of the methods, so we can compare them:
lresults1 <- matrix(c(coef(LGLM)[2]+coef(LGLM)[3]+coef(LGLM)[4]+coef(LGLM)[5]+coef(LGLM)[6],
V1[2,2] + V1[3,3] + V1[4,4] + V1[5,5] + V1[6,6]
+ 2*V1[2,3]+ 2*V1[2,4] + 2*V1[2,5] + 2*V1[2,6]
+ 2*V1[3,4] + 2*V1[3,5] + 2*V1[3,6]
+ 2*V1[4,5] + 2*V1[4,6]
+ 2*V1[5,6],
coef(LGLMm)[2]+coef(LGLMm)[3]+coef(LGLMm)[4]+coef(LGLMm)[5]+coef(LGLMm)[6],
V2[2,2] + V2[3,3] + V2[4,4] + V2[5,5] + V2[6,6]
+ 2*V2[2,3]+ 2*V2[2,4] + 2*V2[2,5] + 2*V2[2,6]
+ 2*V2[3,4] + 2*V2[3,5] + 2*V2[3,6]
+ 2*V2[4,5] + 2*V2[4,6]
+ 2*V2[5,6],
summary(rltmle1)$effect.measures$ATE$estimate,
summary(rltmle1)$effect.measures$ATE$std.dev,
summary(rltmle1m1)$effect.measures$ATE$estimate,
summary(rltmle1m1)$effect.measures$ATE$std.dev,
summary(rltmle1m2)$effect.measures$ATE$estimate,
summary(rltmle1m2)$effect.measures$ATE$std.dev,
summary(slltmle1)$effect.measures$ATE$estimate,
summary(slltmle1)$effect.measures$ATE$std.dev),nrow=6,ncol=2,byrow=TRUE)
rownames(lresults1) <- c("GLM - correctly specified","GLM - incorrectly specified","'ltmle' package - correctly specified","'ltmle' package - outcome misspecified","'ltmle' package - doubly misspecified","SuperLearner LTMLE")
colnames(lresults1) <- c("Coef","SE")
lresults1
Coef SE
GLM - correctly specified 2.410081 0.1136235
GLM - incorrectly specified 5.670451 0.2198963
'ltmle' package - correctly specified 3.115787 0.4358038
'ltmle' package - outcome misspecified 3.366560 0.4693005
'ltmle' package - doubly misspecified 5.094645 0.4414343
SuperLearner LTMLE 3.235332 0.3311814