Time | Activity |
---|---|

9:00-9:15 AM | Introductions, troubleshooting |

9:15-10:15 AM | Lesson 4: Going further with mixed models |

10:15-10:45 AM | break |

10:45-11:45 AM | Lesson 5: Generalized linear mixed models |

11:45 AM-12:00 PM | break |

12:00-1:00 PM | Lesson 6: Estimating and comparing treatment means |

1:00-2:00 PM | lunch break |

2:00-4:00 PM | office hours |

At the end of this lesson, students will …

- Fit a linear mixed model with crossed and nested random effects.
- Fit a linear mixed model with a transformed response variable.

- We are finally going to use some real data!
- Dataset of results from a study of Precision Zonal Management (PZM) conservation agriculture practices
- Two experimental treatments
- two different tillage systems
- crossed with presence or absence of rye cover crop

- Measured maize yield

Load needed packages and read the data.

- Multiple locations (US states) and multiple years
`plot`

and`block`

columns have integers identifying the plots and blocks`tillage`

column has two different values,`"cp"`

(chisel plow) and`"rt"`

(ridge tillage)`cover.trt`

column also has two different values,`"none"`

and`"rye"`

- Convert the
`yr`

,`plot`

, and`block`

columns to factor type - They are integers and we want to treat them as categorical for the model
- Use
`mutate()`

with the pipe`%>%`

Use `ggplot`

(don’t worry about this code for now)

- In some location-year combinations, the tillage treatments seem to result in different yields
- Mostly minimal differences between the cover treatments
- The majority of the variation is between locations and years.

- Fixed effect for tillage treatment
- Fixed effect for cover treatment
- Interaction between the two treatments

`yield ~ tillage + cover.trt + tillage:cover.trt`

- Should location and year be random or fixed?
- Depends on the statistical inference we want to draw
- If we are interested in change over time, year should be a fixed effect
- If are not interested in comparing means of the different years, year should be a random effect
- Treating years as a random population of environments
- The same goes for location

- Let’s say year and location will both be random effects
- Fit them as
*crossed*random effects - An intercept will be fit for each year and each location
- This means we assume the effect of year is constant across locations, and vice versa

`yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location)`

- No random slopes (baseline yield differs across environments but not the treatment effect)

- There are many blocks at each location, each of which has many plots
- Block treated as a random effect
*nested*within location - Block 1 in Illinois is not the same as Block 1 in Minnesota
- Separate intercept fitted for each block in each location
- We do not need to include plot explicitly, because the plots are individual data points
- Use the
`:`

operator to separate nesting terms.

`yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location) + (1 | block:location)`

- Give each location-year combination its own intercept
- This would mean location is
*nested*within year as well - Three levels of nesting for block
- Use this if the locations are far enough apart that the random variation of years isn’t consistent across locations

`yield ~ tillage + cover.trt + tillage:cover.trt + (1 | location:yr) + (1 | block:location:yr)`

- Random slopes
- Must specify which effects are random in the design component
- Specify which grouping factors get separate slopes fitted to them
- For example, fit random slopes
*only*to each main effect and*only*for the year-location grouping factor:

`yield ~ tillage + cover.trt + tillage:cover.trt + (tillage + cover.trt | location:yr) + (1 | block:location:yr)`

- Not just to “make the data normal” (actually the residuals)
- Transformations should be guided by theory and science
- For example, response variables that are ratios or concentrations
- Difference between two values on a logarithmic scale is multiplicative difference (ratio) on an arithmetic scale
- Example: Difference between a ratio of 0.01 and a ratio of 1 is 100 times
- Difference etween a ratio of 1 and 100 is 100 times
- On a log scale those two differences are treated the same (which is good)

- Small subset of an ARS scientist’s data (noise added)
- Different varieties of barley crossed with different types of fungicide
- Multiple plots of each combination of variety and fungicide in each experimental block (column
`Rep`

) - Amount of DON (a mycotoxin) produced measured, units of parts per million (column
`DON_ppm`

)

- Variety, fungicide, and interaction are the fixed effects
- Each
`Rep`

gets a random intercept

- Same as before but the response variable is log-transformed
- This can be done directly in the model formula
- Concentration is on a ratio scale so we are looking at multiplicative changes

- Transformations should be determined by the science, but it’s still good to look at the residuals
- Residuals in untransformed model have non-homogeneous variance and bad match to normal quantile line
- Both diagnostics better for the log-transformed response
- Note:
`check`

argument to`check_model()`

lets you select specific diagnostic plots to make- See all options by looking at help documentation:
`?check_model`

- See all options by looking at help documentation: