Section 4: Spatio-temporal models for disease mapping

Author

Aritz Adin

Published

Aug 13, 2024

In this lab session we describe how to use the bigDM package to fit spatio-temporal models with conditional autoregressive (CAR) priors for space and random walk (RW) priors for time including space-time interactions (Knorr-Held, 2000; Ugarte et al., 2014) by extending our scalable model’s proposal to deal with massive spatio-temporal data (Orozco-Acosta et al., 2023).

The STCAR_INLA() function

This function allows fitting (scalable) spatio-temporal Poisson mixed models to areal count data, where the linear predictor is modelled as

\[\begin{equation} \log r_{it} = \beta_0 + {\bf x}_{it}^{'}\mathbf{\beta} + \xi_i + \gamma_t + \delta_{it}, \quad \mbox{for} \quad i=1,\ldots,I; \; t=1,\ldots,T \end{equation}\]

where \(\beta_0\) is a global intercept, \({\bf x}_{it}^{'}=(x_{it1},\ldots,x_{itp})\) is a p-vector of standardized covariates in the \(i\)-th area and time period \(t\), \(\mathbf{\beta}=(\beta_1,\ldots,\beta_p)^{'}\) is the \(p\)-vector of fixed effect coefficients, \(\xi_i\) is a spatially structured random effect with a CAR prior distribution, \(\gamma_t\) is a temporally structured random effect with a RW prior distribution, and \(\delta_{it}\) is a space-time interaction effect.

These models are flexible enough to describe many real situations, and their interpretation is simple and attractive. However, the models are typically not identifiable and appropriate sum-to-zero constraints must be imposed over the random effects (Goicoa et al., 2018).

Prior distributions for the random effects

Several priors distributions for spatial and temporal random effect are implemented in the STCAR_INLA() function, which are specified through the spatial=... and temporal=... arguments.

For the spatial random effect, the same values as the prior=... argument in the CAR_INLA()function are defined. For the temporally structured random effect, random walks of first (RW1) or second order (RW2) prior distributions can be assumed as follow:

\[\begin{equation} \label{eq:temporal} \gamma \sim N(0,[\tau_{\gamma}R_{\gamma}]^{-}), \end{equation}\]

where \(\tau_{\gamma}\) is a precision parameter, \(R_{\gamma}\) is the \(T \times T\) structure matrix of a RW1/RW2, and \(^{-}\) denotes the Moore-Penrose generalized inverse.

Finally, the following prior distribution is assumed for the space-time interaction random effect \(\delta=(\delta_{11},\ldots,\delta_{I1},\ldots,\delta_{1T},\ldots,\delta_{IT})^{'}\)

\[\begin{equation} \label{eq:space-time} \delta \sim N(0,[\tau_{\delta}R_{\delta}]^{-}). \end{equation}\]

Here, \(\tau_{\delta}\) is a precision parameter and \(R_{\delta}\) is the \(IT \times IT\) matrix obtained as the Kronecker product of the corresponding spatial and temporal structure matrices (recall that \(R_{\xi}=\textbf{D}_{W}-\textbf{W}\)), where four types of interactions can be considered.

Interaction \(R_{\delta}\) Spatial correlation Temporal correlation
Type I \(I_{T} \otimes I_{I}\) - -
Type II \(R_{\gamma} \otimes I_{I}\) - \(\checkmark\)
Type III \(I_{T} \otimes R_{\xi}\) \(\checkmark\) -
Type IV \(R_{\gamma} \otimes R_{\xi}\) \(\checkmark\) \(\checkmark\)

Table 1: Specification for the different types of space-time interactions

Interaction \(R_{\delta}\) Constraints
Type I \(I_{T} \otimes I_{I}\) \(\sum\limits_{i=1}^I \xi_i=0, \, \sum\limits_{t=1}^T \gamma_t=0, \, \mbox{ and } \, \sum\limits_{i=1}^I \sum\limits_{t=1}^T \delta_{it}=0.\)
Type II \(R_{\gamma} \otimes I_{I}\) \(\sum\limits_{i=1}^I \xi_i=0, \, \sum\limits_{t=1}^T \gamma_t=0, \, \mbox{ and } \, \sum\limits_{t=1}^T \delta_{it}=0, \, \mbox{for } \, i=1,\ldots,I.\)
Type III \(I_{T} \otimes R_{\xi}\) \(\sum\limits_{i=1}^I \xi_i=0, \, \sum\limits_{t=1}^T \gamma_t=0, \, \mbox{ and } \, \sum\limits_{i=1}^I \delta_{it}=0, \, \mbox{for } \, t=1,\ldots,T.\)
Type IV \(R_{\gamma} \otimes R_{\xi}\) \(\sum\limits_{i=1}^I \xi_i=0, \, \sum\limits_{t=1}^T \gamma_t=0, \, \mbox{ and } \, \begin{array}{l} \sum\limits_{t=1}^T \delta_{it}=0, \, \mbox{for } \, i=1,\ldots,I, \\ \sum\limits_{i=1}^I \delta_{it}=0, \, \mbox{for } \, t=1,\ldots,T. \\ \end{array}\)

Table 2: Identifiability constraints for the different types of space-time interaction effects in CAR models

Main input arguments

A brief description of the main input arguments and functionalities of the STCAR_INLA() function are described below:

  • carto: object of class sf or SpatialPolygonsDataFrame. This object must contain at least the variable with the identifiers of the spatial areal units specified in the argument ID.area.

  • data: object of class data.frema that must contain the target variables of interest specified in the arguments ID.area, ID.year, O and E.

  • ID.area: character; name of the variable that contains the IDs of spatial areal units.

  • ID.year: character; name of the variable that contains the IDs of time points.

  • ID.group: character; name of the variable that contains the IDs of the spatial partition (grouping variable). Only required if model="partition".

  • O: character; name of the variable that contains the observed number of cases for each areal unit and disease.

  • E: character; name of the variable that contains either the expected number of cases for each areal unit and disease.

  • X: a character vector containing the names of the covariates within the carto object to be included in the model as fixed effects, or a matrix object playing the role of the fixed effects design matrix. If X=NULL (default), only a global intercept is included in the model as fixed effect.

  • W: optional argument with the binary adjacency matrix of the spatial areal units. If NULL (default), this object is computed from the carto argument (two areas are considered as neighbours if they share a common border).

  • spatial: one of either "Leroux" (default), "intrinsic", BYM or BYM2, which specifies the prior distribution considered for the spatial random effect.

  • temporal: one of either "rw1" (default) or `rw2``, which specifies the prior distribution considered for the temporal random effect.

  • interaction: one of either "none", "TypeI", "TypeII", "TypeIII" or "TypeIV" (default), which specifies the prior distribution considered for the space-time interaction random effect.

  • model: one of either "global" or "partition" (default), which specifies the Global model or one of the scalable model proposal’s (Disjoint model and k-order neighbourhood model, respectively).

  • k: numeric value with the neighbourhood order used for the partition model. Usually k=2 or 3 is enough to get good results. If k=0 (default) the Disjoint model is considered. Only required if model="partition".

  • compute.DIC: logical value; if TRUE (default) then approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) are computed.

  • compute.fitted.values: logical value (default FALSE); if TRUE transforms the posterior marginal distribution of the linear predictor to the exponential scale (risks or rates).

  • inla.mode: one of either "classic" (default) or "compact", which specifies the approximation method used by INLA. See help(inla) for further details.

For further details, please refer to the reference manual and the vignettes accompanying this package.

Example: colorectal cancer mortality data during the period 1991-2015

As with multivariate models, both the carto=... and data=... arguments must be included in the STCAR_INLA() function when fitting spatio-temporal models.

In this lab session, the simulated data of lung cancer mortality during the period 1991-2015 included in the Data_LungCancer object will be used as illustration.

library(bigDM)
library(INLA)
Cargando paquete requerido: Matrix
Cargando paquete requerido: sp
This is INLA_24.06.27 built 2024-06-27 02:36:04 UTC.
 - See www.r-inla.org/contact-us for how to get help.
 - List available models/likelihoods/etc with inla.list.models()
 - Use inla.doc(<NAME>) to access documentation
library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(tmap)

Adjuntando el paquete: 'tmap'
The following object is masked from 'package:datasets':

    rivers
tmap4 <- packageVersion("tmap") >= "3.99"

data(Data_MultiCancer)
str(Data_MultiCancer)
'data.frame':   23721 obs. of  5 variables:
 $ ID     : chr  "01001" "01002" "01003" "01004" ...
 $ disease: int  1 1 1 1 1 1 1 1 1 1 ...
 $ obs    : int  3 41 4 6 0 2 4 10 2 7 ...
 $ exp    : num  6.615 42.634 7.431 6.355 0.934 ...
 $ SMR    : num  0.454 0.962 0.538 0.944 0 ...

The data has a common identification variable (ID) to link it with the Carto_SpainMUN object, containing the cartography of Spanish municipalities:

data("Carto_SpainMUN")
Carto_SpainMUN$obs <- NULL
Carto_SpainMUN$exp <- NULL
Carto_SpainMUN$SMR <- NULL

head(Carto_SpainMUN)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 485318 ymin: 4727428 xmax: 543317 ymax: 4779153
Projected CRS: ETRS89 / UTM zone 30N
     ID                                    name           area    perimeter
1 01001                        Alegria-Dulantzi 19913794 [m^2] 34372.11 [m]
2 01002                                 Amurrio 96145595 [m^2] 63352.32 [m]
3 01003                                 Aramaio 73338806 [m^2] 41430.46 [m]
4 01004                              Artziniega 27506468 [m^2] 22605.22 [m]
5 01006                                 Arminon 10559721 [m^2] 17847.35 [m]
6 01008 Arrazua-Ubarrundia (San Martin de Ania) 57502811 [m^2] 64968.81 [m]
      region                       geometry
1 Pais Vasco MULTIPOLYGON (((538259 4737...
2 Pais Vasco MULTIPOLYGON (((503520 4760...
3 Pais Vasco MULTIPOLYGON (((533286 4759...
4 Pais Vasco MULTIPOLYGON (((491260 4776...
5 Pais Vasco MULTIPOLYGON (((509851 4727...
6 Pais Vasco MULTIPOLYGON (((534678 4746...

Global model

The global model, featuring a BYM2 spatial random effect, an RW1 temporal random effect, and a Type IV interaction random effect, is fitted using the STCAR_INLA() function as follows:

## NOT RUN!
# Global <- STCAR_INLA(carto=Carto_SpainMUN, data=Data_LungCancer,
#                      ID.area="ID", ID.year="year", O="obs", E="exp",
#                      spatial="BYM2", temporal="rw1", interaction="TypeIV",
#                      model="global", inla.mode="compact")

NOTE: When the number of small areas and time periods increases considerably (as is the case when analyzing count data at the municipality level), fitting spatio-temporal global models with Type II/Type IV interactions becomes computationally very demanding or even unfeasible.

Partition models

For our analysis, we propose to divide the data into the \(D=47\) provinces of continental Spain. To classify the areas into provinces, the first two digits of the ID.area variable is used.

Carto_SpainMUN$ID.prov <- substr(Carto_SpainMUN$ID,1,2)

In the code below, we show how to fit the disjoint and 1st-order neigbourhood model with Type II space-time interaction random effect using 4 local clusters (in parallel):

options(future.globals.maxSize = 1.5*1e9) ## 1.5 GB

Model.k0 <- STCAR_INLA(carto=Carto_SpainMUN, data=Data_LungCancer,
                       ID.area="ID", ID.year="year", O="obs", E="exp",
                       model="partition", k=0, ID.group="ID.prov",
                       spatial="intrinsic", temporal="rw1", interaction="TypeII",
                       plan="cluster", workers=rep("localhost",4),
                       inla.mode="compact")
gc()
options(future.globals.maxSize = 1.5*1e9) ## 1.5 GB

Model.k1 <- STCAR_INLA(carto=Carto_SpainMUN, data=Data_LungCancer,
                       ID.area="ID", ID.year="year", O="obs", E="exp",
                       model="partition", k=1, ID.group="ID.prov",
                       spatial="intrinsic", temporal="rw1", interaction="TypeII",
                       plan="cluster", workers=rep("localhost",4),
                       inla.mode="compact")
gc()
## Computational time
Model.k0$cpu.used
Running Merging   Total 
1798.11  430.30 2228.41 
Model.k1$cpu.used
Running Merging   Total 
3253.36  837.74 4091.10 
## Model comparison
compare.DIC <- function(x){
  res <- data.frame(mean.deviance=x$dic$mean.deviance, p.eff=x$dic$p.eff,
                    DIC=x$dic$dic, WAIC=x$waic$waic)
  round(res,2)
}
MODELS <- list("k=0"=Model.k0,"k=1"=Model.k1)
do.call(rbind,lapply(MODELS, compare.DIC))
    mean.deviance   p.eff      DIC     WAIC
k=0      336089.6 2890.52 338980.1 338996.4
k=1      336204.2 2677.01 338881.2 338900.4

Computations are made in personal computer with a 3.41 GHz Intel Core i5-7500 processor and 32GB RAM using R-INLA stable version INLA_24.06.27

Plot the results

Maps of posterior median estimates of \(\log{r_{it}}\)

library(RColorBrewer)

## Results for 1st-order neighbourhood model ##
Model <- Model.k1

S <- length(unique(Data_LungCancer$ID))
T <- length(unique(Data_LungCancer$year))
t.from <- min(Data_LungCancer$year)
t.to <- max(Data_LungCancer$year)

log.risks <- matrix(Model$summary.linear.predictor$`0.5quant`, nrow=S, ncol=T, byrow=F)
colnames(log.risks) <- paste("Year", seq(t.from,t.to), sep=".")

carto <- cbind(Carto_SpainMUN,log.risks)

paleta <- brewer.pal(8,"RdYlGn")[8:1]
values <- c(0,0.67,0.77,0.83,1,1.20,1.30,1.50,Inf)

if(tmap4){
  Map.risks <- tm_shape(carto) +
    tm_polygons(fill=paste("Year",round(seq(t.from,t.to,length.out=9)),sep= "."),
                fill.free=FALSE, col_alpha=0,
                fill.scale=tm_scale(values=paleta, breaks=log(values), interval.closure="left", midpoint=0),
                fill.legend=tm_legend("log-risks", show=TRUE, reverse=TRUE,
                                      position=tm_pos_out("right","center"),
                                      frame=FALSE)) +
    tm_grid(n.x=5, n.y=5, alpha=0.2, labels.format=list(scientific=T),
            labels.inside.frame=F, labels.col="white") +
    tm_layout(panel.labels=as.character(round(seq(t.from,t.to,length.out=9)))) +
    tm_facets(nrow=3, ncol=3)
}else{
  Map.risks <- tm_shape(carto) +
    tm_polygons(col=paste("Year",round(seq(t.from,t.to,length.out=9)),sep= "."),
                palette=paleta, title="log-risks", legend.show=T, border.col="transparent",
                legend.reverse=T, style="fixed", breaks=log(values), midpoint=0, interval.closure="left") +
    tm_grid(n.x=5, n.y=5, alpha=0.2, labels.format=list(scientific=T),
            labels.inside.frame=F, labels.col="white") +
    tm_layout(main.title="", main.title.position="center", panel.label.size=1.5,
              legend.outside=T, legend.outside.position="right", legend.frame=F,
              legend.outside.size=0.2, outer.margins=c(0.02,0.01,0.02,0.01),
              panel.labels=as.character(round(seq(t.from,t.to,length.out=9)))) +
    tm_facets(nrow=3, ncol=3)
}

tmap_mode("plot")
print(Map.risks)

Maps of posterior exceedence probabilities \(Pr(\log{r_{it}}>0 | {\bf 0})\)

probs <- matrix(1-Model$summary.linear.predictor$`0cdf`, nrow=S, ncol=T, byrow=F)
colnames(probs) <- paste("Year", seq(t.from,t.to), sep=".")

carto <- cbind(Carto_SpainMUN,probs)

paleta <- brewer.pal(6,"Blues")[-1]
values <- c(0,0.1,0.2,0.8,0.9,1)

if(tmap4){
  Map.probs <- tm_shape(carto) +
    tm_polygons(fill=paste("Year",round(seq(t.from,t.to,length.out=9)),sep="."),
                fill.free=FALSE, col_alpha=0,
                fill.scale=tm_scale(values=paleta, breaks=values, interval.closure="left",
                                    labels=c("[0-0.1)","[0.1-0.2)","[0.2-0.8)","[0.8-0.9)","[0.9-1]")),
                fill.legend=tm_legend("Probs", show=TRUE, reverse=TRUE,
                                      position=tm_pos_out("right","center"),
                                      frame=FALSE)) +
    tm_grid(n.x=5, n.y=5, alpha=0.2, labels.format=list(scientific=T),
            labels.inside.frame=F, labels.col="white") +
    tm_layout(panel.labels=as.character(round(seq(t.from,t.to,length.out=9)))) +
    tm_facets(nrow=3, ncol=3)
  
}else{
  Map.probs <- tm_shape(carto) +
    tm_polygons(col=paste("Year",round(seq(t.from,t.to,length.out=9)),sep="."),
                palette=paleta, title="", legend.show=T, border.col="transparent",
                legend.reverse=T, style="fixed", breaks=values, interval.closure="left",
                labels=c("[0-0.1)","[0.1-0.2)","[0.2-0.8)","[0.8-0.9)","[0.9-1]")) +
    tm_grid(n.x=5, n.y=5, alpha=0.2, labels.format=list(scientific=T),
            labels.inside.frame=F, labels.col="white") +
    tm_layout(main.title="Probs", main.title.position="center", panel.label.size=1.5,
              legend.outside=T, legend.outside.position="right", legend.frame=F,
              legend.outside.size=0.2, outer.margins=c(0.02,0.01,0.02,0.01),
              panel.labels=as.character(round(seq(t.from,t.to,length.out=9)))) +
    tm_facets(nrow=3, ncol=3)
}

tmap_mode("plot")
print(Map.probs)

Temporal evolution of mortality risks for some selected municipalities (Pamplona, Valencia and Toledo) and its corresponding 95% credible intervals. The colors used in the bands are associated with the posterior exceedence probabilities represented in the previous maps.