# mvtboost example 2: Well-being

#### 2016-12-05

This document describes the analysis of psychological well-being by multivariate tree boosting from the paper (Miller, Lubke, McArtor, & Bergeman, 2015).

## Introduction

In our exploratory analysis, we investigated the factors that influence six aspects of well-being: autonomy, environmental mastery, personal growth, positive relationships with others, purpose in life, and self-acceptance (Ryff & Keyes, 1995).

## Predictors

Gender, age, income, and education were included as demographic predictors. The primary predictors of interest were chronic, somatic, and self-reported health, depression (positive and negative affect), perceived social control, control of internal states, sub-scales of dispositional resilience (commitment, control, and challenge), ego resilience, social support (friends and family), self-reported stress (problems, emotions), and loneliness. In total, 20 predictors were included in the analysis.

## Fitting the model

#install.packages("mvtboost")
library(mvtboost)
data(wellbeing)

Y <- wellbeing[,21:26]
X <- wellbeing[,1:20]
Ys <- scale(Y)
ynames <- c("Autonomy","Environmental Mastery","Personal Growth","Positive Relationships","Purpose in Life","Self Acceptance")
xnames <- c("Gender","Age","Income","Education","Chronic Health","Somatic Health","Self Report Health","Positive Affect","Negative Affect","Perceived Social Control","Control Internal States","Commitment","Control","Challenge","Ego Resilience","Social Support - Friends","Social Support - Family","Stress-Problems","Stress-Emotions","Loneliness")
cont.id <- unlist(lapply(X,is.numeric))
Xs <- X
Xs[,cont.id] <- scale(X[,cont.id])
colnames(Xs) <- xnames
colnames(Ys) <- ynames
res <- mvtb(Y=Ys,X=Xs)

## Tuning the model by 5-Fold CV

In gradient boosting, the number of trees, the shrinkage, and the tree depth are meta-parameters that are important to tune to improve the fit of the model. Typically, the shrinkage is fixed to a small value, and the optimal number of trees is chosen by cross-validation. This is illustrated below:

res5 <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE)

A set of observations can be explicitly specified as the training set by passing a vector of ids trainset to the argument s. Cross-validation will only occur within the training set.

set.seed(104)
trainset <- sample(1:nrow(Ys),size = 784,replace=FALSE)
res5train <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE,s=trainset)

## Computing the best number of trees

As with univariate gradient boosted trees, the number of trees can be chosen to minimize a test or cross-validation estimate of the prediction error. mvtb uses the multivariate MSE in a test set or (kth) fold as the estimate of multivariate prediction error.

res5$best.trees summary(res5) ##$best.testerr
## integer(0)
##
## $best.cv ## [1] 251 ## ##$last
## [1] 1000

Most procedures in the mvtboost package will by default automatically select the lowest number of trees provided by training, CV, or test error, which corresponds to a minimally complex model.

## Showing training and cv error

Below we show the importance of choosing the model complexity to minimize the CV rather than training error.

Increasing the depth of trees to 5 or 10 may also improve performance.

## Interpret the model

One of the challenges of using multivariate decision tree ensembles is that the model is more difficult to interpret than a single tree. While tree boosting can be used to build a very accurate predictive model, it is potentially more important for researchers to interpret the effects of predictors. Below, we describe approaches that have been developed to

• identify predictors with effects on individual outcome variables
• identify groups of predictors that jointly influence one or more outcome variables
• visualize the functional form of the effect of important predictors
• detect predictors with possible interaction non-linear effects.

### 1. Relative influence

The influence (or variable importance) of each predictor can be used to identify ‘important’ predictors. It is defined as the reduction in sums of squared error due to any split on that predictor, summed over all trees in the model (Friedman, 2001). Usually the score is relative, expressed as a percent of the total sums of squared error reductions from all predictors.

Below, we compute the relative influences for the well-being data, and plot the influences using a heat map. Using the mvtb.ri function, the influence scores can sum to 100 for each outcome (relative='col') or across outcomes (relative='tot'). By default, importances are relative to the column.

summary(res5)
round(mvtb.ri(res5,relative = "tot"),2) 
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.1f", val))}

par(mar=c(8,10,1,1))
mvtb.heat(t(mvtb.ri(res5)),clust.method = NULL,cexRow=1,cexCol=1,numformat=numformat)

For example, control of internal states affects all aspects of psychological well being except positive relationships with others. Like control of internal states, perceived stress-problems affects three aspects of well-being: self acceptance, purpose in life, and environmental mastery. Other patterns in the influences can be interpreted similarly.

### 2. Fit

As a check of the overall fit of the model, the $$R^2$$ in the test set can be computed for each dependent variable.

testset <- (1:nrow(Ys))[!(1:nrow(Ys) %in% trainset)]
yhat <- predict(res5train,newdata=Xs[testset,])
diag(var(yhat)/var(Ys[testset,]))
##               Autonomy  Environmental Mastery        Personal Growth
##              0.2272033              0.7459942              0.3862064
## Positive Relationships        Purpose in Life        Self Acceptance
##              0.5079466              0.5351979              0.4969813

### 3. Covariance explained

It may also be informative to select a set of outcome variables that are associated with groups of predictors. One criterion for selecting outcome variables is to choose the outcome variables whose covariance can be explained by a function of a common set of predictors. The covariance explained in each pair of outcomes by predictors is estimated directly in mvtb.

A covariance-explained matrix be organized as a $$Q(Q+1)/2 \times p$$ table, where $$Q$$ is the number of outcomes, and $$p$$ is the number of predictors. Each element is the covariance explained in any pair of outcomes by a predictor. When the outcomes are standardized to unit variance, each element can be interpreted as the correlation explained in any pair of outcomes by a predictor.

This decomposition is similar to decomposing $$R^2$$ in multiple regression. When the trees of the ensemble are limited to a single split and the predictors are independent, this decomposition is exact, otherwise it is approximate.

For the well-being data, the covariance explained matrix is obtained directly from the fitted model: mvtb.covex(res5, Y=Ys, X=Xs). It can also be plotted as a heatmap, which we illustrate below:

par(mar=c(8,15,1,1),mfrow=c(1,1))
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", val))}
covex <- mvtb.covex(res5, Y=Ys, X=Xs)
mvtb.heat(covex[,-c(1:7)],cexRow=.9,numformat=numformat,clust.method = NULL)

Negative affect and stress problems have widespread effects on well-being. Control of internal states explains correlations across all dimensions, and is the primary explanatory predictor for autonomy.

### Clustering covariance explained.

If the number of predictors/outcomes is large, interpreting the matrix is challenging. The covariance explained matrix can be clustered by grouping the predictors that explain covariance in similar pairs of outcomes. This is done by hierarchical clustering of the distance between columns (predictors) and the rows (pairs of outcomes).

Clustering the matrix can be done with mvtb.cluster. Below we plot the solution as a heatmap with mvtb.heat.

mvtb.cluster(covex)
par(mar=c(8,12,1,1),mfrow=c(1,1))
mvtb.heat(covex[,-c(1:7)],cexRow=.9)

### 4. Partial Dependence Plots

Partial dependence plots complement interpretations of relative influence by showing the direction and functional form of the effect of the predictor. A partial dependence plot shows the effect of the predictor averaging over (or integrating out) the effects of other predictors.

Here we show the univariate and multivariate perspective plots. The first plot shows that above-average control of internal states is associated with larger personal growth. The second shows the non-additive effect of control of internal states and perceived stress problems on self-acceptance.

par(mfcol=c(1,2),mar=c(5,5,4,1))
plot(res5,predictor.no=11,response.no=3,ylim=c(-1,1.5)) # persgrwth on cis
text(-4,1.825,labels="A",xpd=TRUE)
mvtb.perspec(res5,predictor.no=c(11,18),response.no=6)
text(-.5,.5,labels="B",xpd=TRUE)

Though decision trees are models of interactions, it is difficult to detect and interpret interaction effects from a decision tree ensemble. To address this issue, we can analyze the fitted values of the model. Following Elith et al. (2008), possible 2-way interactions can be detected by checking whether the fitted values of the model as a function of any pair of predictors deviates from a linear combination of the two predictors. Such departures indicate that the joint effect of the predictors is not additive, and indicate a non-linear effect or a possible interaction.

In greater detail: the fitted values of the model are computed for any pair of predictors, over a grid of all possible levels for the two variables. For continuous predictors, 100 sample values are taken. The fitted values are then regressed onto the grid. Large residuals from this model indicate the fitted values are not a linear combination of the predictors, demonstrating non-linearity or a possible interaction. For computational simplicity with many predictors, this might be done only for pairs of important variables.

We note that this approach is primarily a heuristic for interpreting the model. A variable with a non-additive effect (e.g. a non-linear effect like control of internal states) can produce bivariate departures from additivity which are not necessarily interactions.

There are several other methods for detecting departures, including ‘grid’, ‘influence’, and ‘lm’, following different suggestions. Generally these are still heuristics that give somewhat different results.

res.nl <- mvtb.nonlin(res5,Y=Ys,X=Xs)

Below, we compute the top 5 departures for each DV.

lapply(res.nl,function(r){head(r[[1]])})
## $Autonomy ## var1.index var1.names var2.index var2.names ## 1 14 Challenge 11 Control Internal States ## 2 13 Control 11 Control Internal States ## 3 19 Stress-Emotions 11 Control Internal States ## 4 11 Control Internal States 2 Age ## 5 11 Control Internal States 8 Positive Affect ## 6 17 Social Support - Family 11 Control Internal States ## nonlin.size ## 1 42.27263 ## 2 41.45230 ## 3 39.38546 ## 4 38.56666 ## 5 38.03321 ## 6 37.11201 ## ##$Environmental Mastery
##   var1.index              var1.names var2.index               var2.names
## 1         18         Stress-Problems         11  Control Internal States
## 2         19         Stress-Emotions         11  Control Internal States
## 3         11 Control Internal States         10 Perceived Social Control
## 4         11 Control Internal States          9          Negative Affect
## 5         20              Loneliness         11  Control Internal States
## 6         11 Control Internal States          7       Self Report Health
##   nonlin.size
## 1    18.39823
## 2    17.74754
## 3    17.68514
## 4    17.51438
## 5    16.15804
## 6    15.61149
##
## $Personal Growth ## var1.index var1.names var2.index var2.names ## 1 14 Challenge 11 Control Internal States ## 2 15 Ego Resilience 11 Control Internal States ## 3 13 Control 11 Control Internal States ## 4 11 Control Internal States 2 Age ## 5 16 Social Support - Friends 11 Control Internal States ## 6 12 Commitment 11 Control Internal States ## nonlin.size ## 1 69.40942 ## 2 69.10555 ## 3 63.69093 ## 4 63.66075 ## 5 63.47319 ## 6 62.45037 ## ##$Positive Relationships
##   var1.index               var1.names var2.index               var2.names
## 1         16 Social Support - Friends         11  Control Internal States
## 2         12               Commitment         11  Control Internal States
## 3         16 Social Support - Friends         12               Commitment
## 4         20               Loneliness         11  Control Internal States
## 5         20               Loneliness         16 Social Support - Friends
## 6         11  Control Internal States         10 Perceived Social Control
##   nonlin.size
## 1    19.22962
## 2    17.52028
## 3    17.38097
## 4    14.78180
## 5    14.64248
## 6    14.56942
##
## $Purpose in Life ## var1.index var1.names var2.index var2.names ## 1 18 Stress-Problems 11 Control Internal States ## 2 11 Control Internal States 10 Perceived Social Control ## 3 12 Commitment 11 Control Internal States ## 4 14 Challenge 11 Control Internal States ## 5 11 Control Internal States 9 Negative Affect ## 6 13 Control 11 Control Internal States ## nonlin.size ## 1 21.79651 ## 2 21.25287 ## 3 21.04444 ## 4 21.02452 ## 5 20.25459 ## 6 19.60006 ## ##$Self Acceptance
##   var1.index              var1.names var2.index               var2.names
## 1         19         Stress-Emotions         11  Control Internal States
## 2         18         Stress-Problems         11  Control Internal States
## 3         12              Commitment         11  Control Internal States
## 4         11 Control Internal States          9          Negative Affect
## 5         20              Loneliness         11  Control Internal States
## 6         11 Control Internal States         10 Perceived Social Control
##   nonlin.size
## 1    16.14521
## 2    14.98212
## 3    13.42367
## 4    13.26525
## 5    13.13256
## 6    13.04356

## References

Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.

Miller P.J., Lubke G.H, McArtor D.B., Bergeman C.S. (Submitted) Finding structure in data with multivariate tree boosting.

Ridgeway, G., Southworth, M. H., & RUnit, S. (2013). Package ‘gbm’. Viitattu, 10, 2013.

Wallace, K. A., Bergeman, C. S., & Maxwell, S. E. (2002). Predicting well-being outcomes in later life: An application of classification and regression tree (CART) analysis.