At the end of this lesson, students will …

- Generate predictions from a mixed model.
- Test specific hypotheses with contrasts.
- Compare multiple group means and correct for multiple comparisons.

- You might know the term “least square means” or “lsmeans” if you are a SAS user
- Estimated marginal means (EMM) are the same thing as lsmeans in most cases.
- Least squares method is not always used to calculate them, so we should call them EMM

- EMMs are predictions estimated from a model,
**not data** - They are functions of model parameters
- Sensitive to all assumptions made in constructing the model.
- (All models are wrong and only some of them are useful!)
- “Marginal” because they’re averaged across the margins of all other fixed and random effects in the model

- Confidence interval or uncertainty around an estimated marginal mean is not the same thing as the range of variation in the population
- Example: population-level mean of the height of an adult female in the United States is 161.3 cm (5’ 3.5”)
- 90% confidence interval goes from 160.9 cm to 161.6 cm (5’ 3.38” to 5’ 3.63”).
- Mean was estimated from a sample of 5,510 women
- We are very confident we know the mean within a tiny range

- But if we took a random adult female from the United States and measured her height, does that mean we are 90% confident it would be between 160.9 cm and 161.6 cm? No!
- In fact, the 5th and 95th percentile are 149.8 cm (4’11”) and 172.5 cm (5’8”)
- 90% of females’ heights are between those much more widely separated values
- Estimated marginal means tell us the most likely expected value of a random individual from the population, not about the variation in the population itself
- Especially keep this in mind when averaging the estimated means across random effects and other fixed effects

- Developed by Russ Lenth, an emeritus statistics professor from the University of Iowa (Thanks Russ!)
- We are only going to cover the basics here
- It has a lot of vignettes and documentation you can check out
- It works with lots of different modeling packages including Bayesian ones
- Also consider the
**marginaleffects**package

- Revisit the fake biomass versus fertilization dataset that we worked with earlier
- We need to reload the data and refit the model
- Load needed packages, read data back in, and refit model

- Also set the global
**ggplot2**theme to black on white because the default white on gray is so so ugly

- Estimates marginal means by fixed effect
`emmeans()`

can take many arguments, but it needs at least two- The first argument is a fitted model object
- Second argument is a one-sided formula beginning with
`~`

- Variable or combination of variables for which we want to estimate the marginal means (fixed effects)
- Here the only fixed effect is
`treatment`

- Object with the estimated marginal means averaged across all other fixed effects and all random effects
- We now have estimated marginal means and a confidence interval (default 95%) for each one
- Degrees of freedom are approximated, as must be the case with mixed models
- We can use the
`plot()`

function to show the means and 95% confidence intervals

- The plot shows the 95% confidence intervals of the means overlap
- This does not mean they are not significantly different from one another
- We have to take the difference between each pair of means and test if it is different from zero

`contrast()`

does comparisons between means- First argument is an
`emmeans`

object - Second argument,
`method`

, is the type of contrast - Using
`method = 'pairwise'`

compares all pairs of treatments - Takes the difference between each pair of means and calculates the associated
*t*-statistic and*p*-value - Automatically adjusts
*p*-value for multiple comparisons using the Tukey adjustment.

- First argument is an

- Choose other methods of
*p*-value adjustment using the`adjust`

argument - I usually prefer the Sidak adjustment
- In this case it makes very little difference

`contrast`

objects also have a built-in plotting method using the`plot()`

function

- We can add a dashed line at zero to highlight where 95% CI does not contain zero

- Used to summarize many pairwise comparisons
- Use
`cld()`

function from the**multcomp**package - CLD = Compact Letter Display
- First argument is
`emmeans`

object - Specify multiple comparisons adjustment method

- Default labels are
`1`

,`2`

,`3`

instead of`a`

,`b`

,`c`

- Specify
`letters`

as the labels to get the more familiar letters:

- Example: comparing control with each other treatment, but not comparing treatments with each other
- Specify
`method = 'trt.vs.ctrl'`

when you call`contrast()`

- This will apply Dunnett’s adjustment to the
*p*-values - Higher power to detect differences from the control as we are ignoring differences between the non-control levels

- If model has transformed response variable or link function, EMMs will be on the transformed scale
- Change this default by adding the argument
`type = 'response'`

to`emmeans()`

- This auto-detects the transformation
- Presents EMMS transformed back to scale of the data (the response scale)

- Revisit the Stirret corn borers dataset from previous lesson

- Original scale

- Response scale

- Notice that the second column is now called
`ratio`

- Subtracting on a log scale = dividing on an arithmetic scale

\[\log\frac{a}{b} = \log a - \log b\]

- Count data model uses a log link function so comparisons are on a ratio scale
- For example,
`None / Late`

= 2.72 (model estimate: 2.72 times as many corn borers expected on a plot with no fungal spores applied, versus one with late fungal spore application treatment)

- Revisit fish fillets model from Lesson 3 exercise
- Fixed effect of species (channel catfish vs. hybrid catfish)
- Fixed effect of preparation method (fresh, frozen, and raw)
- Fixed effect of species by prep method interaction
- Random intercept for fish ID (multiple thickness measurements per fish)

- Estimate marginal means for species, averaged across all the preparation method
- Similarly, estimate marginal means for preparation method, averaged across both species

- Notice the message that the
`species`

and`prep`

variables are involved in an interaction

- Specify more than one variable, separating them with a
`+`

`cld()`

will do comparison for all six estimates

- Use
`|`

symbol to show which fixed effect should be used to group the comparisons - For example
`prep | species`

gets estimated marginal means for`prep`

grouped by`species`

`cld()`

comparison is done within each species separately

- Here, relative ordering of the prep methods is the same for both species

- Some people use a “two-step” process to compare means
- The first step is to examine the ANOVA table
- If F-test has p < 0.05, proceed to post hoc comparison of means; otherwise, the means are not different

- But if you properly account for multiple comparisons you do not need to worry about the F-test
- F-test does not really give a useful inference in most cases
- It only says there is
*some*difference between the groups, not which groups or by how much

- Model comparison is another big part of hypothesis testing
- Different models are different hypotheses about how the world works
- We want to evaluate the relative amount of evidence for each one
- Likelihood ratio tests and information criteria are used to compare mixed models
- We will cover them in a future workshop!

This concludes the mixed models in R workshop. We have learned a lot in the past two days!

- The basics of R, including variables, functions, scripts, and packages.
- How to work with data frames in R, including tidying, reshaping, and summarizing.
- How to fit linear mixed models with random intercepts and random slopes.
- How to fit mixed models with crossed and nested effects and transformed responses.
- How to fit generalized linear mixed models.
- How to compare estimated marginal means from linear mixed models

- Link to post workshop feedback form: https://forms.gle/HkvfkUejzg5T18xcA
- Or feel free to email me your thoughts
- I’m looking forward to seeing folks at the next workshop!