# Lesson 4: Going further with linear mixed models

## Day 2 Schedule

Time Activity
9:00-9:15 AM Introductions, troubleshooting
9:15-10:15 AM Lesson 4: Going further with mixed models
10:15-10:45 AM break
10:45-11:45 AM Lesson 5: Generalized linear mixed models
11:45 AM-12:00 PM break
12:00-1:00 PM Lesson 6: Estimating and comparing treatment means
1:00-2:00 PM lunch break
2:00-4:00 PM office hours

## Lesson 4 learning objectives

At the end of this lesson, students will …

• Fit a linear mixed model with crossed and nested random effects.
• Fit a linear mixed model with a transformed response variable.

## Crossed and nested random effects

• We are finally going to use some real data!
• Dataset of results from a study of Precision Zonal Management (PZM) conservation agriculture practices
• Two experimental treatments
• two different tillage systems
• crossed with presence or absence of rye cover crop
• Measured maize yield

library(tidyverse)
library(easystats)
library(lme4)
library(lmerTest)
maize <- read_csv('datasets/pzm_maize.csv')

## Examine the data

glimpse(maize)
• Multiple locations (US states) and multiple years
• plot and block columns have integers identifying the plots and blocks
• tillage column has two different values, "cp" (chisel plow) and "rt" (ridge tillage)
• cover.trt column also has two different values, "none" and "rye"

## Create factor columns

• Convert the yr, plot, and block columns to factor type
• They are integers and we want to treat them as categorical for the model
• Use mutate() with the pipe %>%
maize <- maize %>%
mutate(
yr = factor(yr),
plot = factor(plot),
block = factor(block)
)

## Boxplot of the data

Use ggplot (don’t worry about this code for now)

ggplot(maize, aes(y = yield, x = tillage, fill = cover.trt)) +
facet_grid(location ~ yr) +
geom_boxplot(position = position_dodge()) +
theme_bw()
• In some location-year combinations, the tillage treatments seem to result in different yields
• Mostly minimal differences between the cover treatments
• The majority of the variation is between locations and years.

## Setting up the model

• Fixed effect for tillage treatment
• Fixed effect for cover treatment
• Interaction between the two treatments

yield ~ tillage + cover.trt + tillage:cover.trt

• Should location and year be random or fixed?
• Depends on the statistical inference we want to draw
• If we are interested in change over time, year should be a fixed effect
• If are not interested in comparing means of the different years, year should be a random effect
• Treating years as a random population of environments
• The same goes for location

## Building model formula

• Let’s say year and location will both be random effects
• Fit them as crossed random effects
• An intercept will be fit for each year and each location
• This means we assume the effect of year is constant across locations, and vice versa

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location)

• No random slopes (baseline yield differs across environments but not the treatment effect)

## Building model formula some more

• There are many blocks at each location, each of which has many plots
• Block treated as a random effect nested within location
• Block 1 in Illinois is not the same as Block 1 in Minnesota
• Separate intercept fitted for each block in each location
• We do not need to include plot explicitly, because the plots are individual data points
• Use the : operator to separate nesting terms.

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location) + (1 | block:location)

## Full model formula

fit_maize <- lmer(yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location) + (1 | block:location), data = maize)

### Alternative formula 1

• Give each location-year combination its own intercept
• This would mean location is nested within year as well
• Three levels of nesting for block
• Use this if the locations are far enough apart that the random variation of years isn’t consistent across locations

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | location:yr) + (1 | block:location:yr)

### Alternative formula 2

• Random slopes
• Must specify which effects are random in the design component
• Specify which grouping factors get separate slopes fitted to them
• For example, fit random slopes only to each main effect and only for the year-location grouping factor:

yield ~ tillage + cover.trt + tillage:cover.trt + (tillage + cover.trt | location:yr) + (1 | block:location:yr)

## Data transformations

• Not just to “make the data normal” (actually the residuals)
• Transformations should be guided by theory and science
• For example, response variables that are ratios or concentrations
• Difference between two values on a logarithmic scale is multiplicative difference (ratio) on an arithmetic scale
• Example: Difference between a ratio of 0.01 and a ratio of 1 is 100 times
• Difference etween a ratio of 1 and 100 is 100 times
• On a log scale those two differences are treated the same (which is good)

## Log transformation example

• Small subset of an ARS scientist’s data (noise added)
• Different varieties of barley crossed with different types of fungicide
• Multiple plots of each combination of variety and fungicide in each experimental block (column Rep)
• Amount of DON (a mycotoxin) produced measured, units of parts per million (column DON_ppm)
barley_don <- read_csv('datasets/barley_don.csv')

## Examine the data

glimpse(barley_don)
ggplot(barley_don, aes(x = Variety, y = DON_ppm, fill = Fungicide)) +
geom_boxplot(position = position_dodge()) +
theme_bw()

## Fit a model

• Variety, fungicide, and interaction are the fixed effects
• Each Rep gets a random intercept
fit_don <- lmer(DON_ppm ~ Variety + Fungicide + Variety:Fungicide + (1|Rep), data = barley_don)

## Log transformed response

• Same as before but the response variable is log-transformed
• This can be done directly in the model formula
• Concentration is on a ratio scale so we are looking at multiplicative changes
fit_don_log <- lmer(log(DON_ppm) ~ Variety + Fungicide + Variety:Fungicide + (1|Rep), data = barley_don)

## Improvement to residuals

• Transformations should be determined by the science, but it’s still good to look at the residuals
• Residuals in untransformed model have non-homogeneous variance and bad match to normal quantile line
• Both diagnostics better for the log-transformed response
• Note: check argument to check_model() lets you select specific diagnostic plots to make
• See all options by looking at help documentation: ?check_model
check_model(fit_don, check = c('homogeneity', 'qq'))
check_model(fit_don_log, check = c('homogeneity', 'qq'))