Lesson 6: Estimating and comparing treatment means

Lesson 6 learning objectives

At the end of this lesson, students will …

• Generate predictions from a mixed model.
• Test specific hypotheses with contrasts.
• Compare multiple group means and correct for multiple comparisons.

Estimated marginal means

• You might know the term “least square means” or “lsmeans” if you are a SAS user
• Estimated marginal means (EMM) are the same thing as lsmeans in most cases.
• Least squares method is not always used to calculate them, so we should call them EMM

Estimated marginal means are estimates

• EMMs are predictions estimated from a model, not data
• They are functions of model parameters
• Sensitive to all assumptions made in constructing the model.
• (All models are wrong and only some of them are useful!)
• “Marginal” because they’re averaged across the margins of all other fixed and random effects in the model

Estimated means are population-level predictions

• Confidence interval or uncertainty around an estimated marginal mean is not the same thing as the range of variation in the population
• Example: population-level mean of the height of an adult female in the United States is 161.3 cm (5’ 3.5”)
• 90% confidence interval goes from 160.9 cm to 161.6 cm (5’ 3.38” to 5’ 3.63”).
• Mean was estimated from a sample of 5,510 women
• We are very confident we know the mean within a tiny range
• But if we took a random adult female from the United States and measured her height, does that mean we are 90% confident it would be between 160.9 cm and 161.6 cm? No!
• In fact, the 5th and 95th percentile are 149.8 cm (4’11”) and 172.5 cm (5’8”)
• 90% of females’ heights are between those much more widely separated values
• Estimated marginal means tell us the most likely expected value of a random individual from the population, not about the variation in the population itself
• Especially keep this in mind when averaging the estimated means across random effects and other fixed effects

The emmeans package

• Developed by Russ Lenth, an emeritus statistics professor from the University of Iowa (Thanks Russ!)
• We are only going to cover the basics here
• It has a lot of vignettes and documentation you can check out
• It works with lots of different modeling packages including Bayesian ones
• Also consider the marginaleffects package

Using emmeans to estimate means

• Revisit the fake biomass versus fertilization dataset that we worked with earlier
• We need to reload the data and refit the model
library(tidyverse)
library(lme4)
library(emmeans)
library(multcomp)
fert_biomass <- read_csv('datasets/fertilizer_biomass.csv') %>%
mutate(treatment = factor(treatment, levels = c('control', 'low', 'high')))

fert_mm <- lmer(biomass ~ treatment + (1 | field), data = fert_biomass)
• Also set the global ggplot2 theme to black on white because the default white on gray is so so ugly
theme_set(theme_bw())

The emmeans() function

• Estimates marginal means by fixed effect
• emmeans() can take many arguments, but it needs at least two
• The first argument is a fitted model object
• Second argument is a one-sided formula beginning with ~
• Variable or combination of variables for which we want to estimate the marginal means (fixed effects)
• Here the only fixed effect is treatment
fert_emm <- emmeans(fert_mm, ~ treatment)

Output of emmeans()

• Object with the estimated marginal means averaged across all other fixed effects and all random effects
• We now have estimated marginal means and a confidence interval (default 95%) for each one
• Degrees of freedom are approximated, as must be the case with mixed models
• We can use the plot() function to show the means and 95% confidence intervals
plot(fert_emm)

Contrasts

• The plot shows the 95% confidence intervals of the means overlap
• This does not mean they are not significantly different from one another
• We have to take the difference between each pair of means and test if it is different from zero

The contrast() function

• contrast() does comparisons between means
• First argument is an emmeans object
• Second argument, method, is the type of contrast
• Using method = 'pairwise' compares all pairs of treatments
• Takes the difference between each pair of means and calculates the associated t-statistic and p-value
contrast(fert_emm, method = 'pairwise')

• Choose other methods of p-value adjustment using the adjust argument
• I usually prefer the Sidak adjustment
• In this case it makes very little difference
contr_sidak <- contrast(fert_emm, method = 'pairwise', adjust = 'sidak')
• contrast objects also have a built-in plotting method using the plot() function
plot(contr_sidak)
• We can add a dashed line at zero to highlight where 95% CI does not contain zero
plot(contr_sidak) + geom_vline(xintercept = 0, linetype = 'dashed', linewidth = 1)

Multiple comparison letters

• Used to summarize many pairwise comparisons
• Use cld() function from the multcomp package
• CLD = Compact Letter Display
• First argument is emmeans object
• Specify multiple comparisons adjustment method
cld(fert_emm, adjust = 'sidak')
• Default labels are 1, 2, 3 instead of a, b, c
• Specify letters as the labels to get the more familiar letters:
cld(fert_emm, adjust = 'sidak', Letters = letters)

Contrast methods other than pairwise

• Example: comparing control with each other treatment, but not comparing treatments with each other
• Specify method = 'trt.vs.ctrl' when you call contrast()
• This will apply Dunnett’s adjustment to the p-values
• Higher power to detect differences from the control as we are ignoring differences between the non-control levels
contrast(fert_emm, method = 'trt.vs.ctrl')

Back-transformation of estimated marginal means

• If model has transformed response variable or link function, EMMs will be on the transformed scale
• Change this default by adding the argument type = 'response' to emmeans()
• This auto-detects the transformation
• Presents EMMS transformed back to scale of the data (the response scale)

Back-transformation example

• Revisit the Stirret corn borers dataset from previous lesson
data('stirret.borers', package = 'agridat')

stirret.borers <- stirret.borers %>%
mutate(trt = factor(trt, levels = c('None', 'Early', 'Late', 'Both')))

glmm_borers <- glmer(count2 ~ trt + (1|block), data = stirret.borers, family = poisson)

EMMs on original and response scale

• Original scale
emmeans(glmm_borers, ~ trt)
• Response scale
emm_borers <- emmeans(glmm_borers, ~ trt, type = 'response')

Contrasts of transformed EMMs

contrast(emm_borers, method = 'pairwise', adjust = 'sidak')
• Notice that the second column is now called ratio
• Subtracting on a log scale = dividing on an arithmetic scale

$\log\frac{a}{b} = \log a - \log b$

• Count data model uses a log link function so comparisons are on a ratio scale
• For example, None / Late = 2.72 (model estimate: 2.72 times as many corn borers expected on a plot with no fungal spores applied, versus one with late fungal spore application treatment)

Estimated marginal means with multiple predictor variables

• Revisit fish fillets model from Lesson 3 exercise
• Fixed effect of species (channel catfish vs. hybrid catfish)
• Fixed effect of preparation method (fresh, frozen, and raw)
• Fixed effect of species by prep method interaction
• Random intercept for fish ID (multiple thickness measurements per fish)
fish_fillets <- read_csv('datasets/fish_fillets.csv')
fit_fillets <- lmer(thickness ~ species + prep + species:prep + (1|fishID), data = fish_fillets)

EMMs for individual fixed effect groupings

• Estimate marginal means for species, averaged across all the preparation method
• Similarly, estimate marginal means for preparation method, averaged across both species
emm_species <- emmeans(fit_fillets, ~ species)
emm_prep <- emmeans(fit_fillets, ~ prep)
• Notice the message that the species and prep variables are involved in an interaction

EMMs for combinations of fixed effects

• Specify more than one variable, separating them with a +
emm_species_x_prep <- emmeans(fit_fillets, ~ species + prep)
• cld() will do comparison for all six estimates
cld(emm_species_x_prep, adjust = 'sidak', Letters = letters)

Multiple comparisons by group

• Use | symbol to show which fixed effect should be used to group the comparisons
• For example prep | species gets estimated marginal means for prep grouped by species
• cld() comparison is done within each species separately
emm_prep_within_species <- emmeans(fit_fillets, ~ prep | species)

cld(emm_prep_within_species, adjust = 'sidak', Letters = letters)
• Here, relative ordering of the prep methods is the same for both species

Note on comparing EMMs versus ANOVA

• Some people use a “two-step” process to compare means
• The first step is to examine the ANOVA table
• If F-test has p < 0.05, proceed to post hoc comparison of means; otherwise, the means are not different
• But if you properly account for multiple comparisons you do not need to worry about the F-test
• F-test does not really give a useful inference in most cases
• It only says there is some difference between the groups, not which groups or by how much

• Model comparison is another big part of hypothesis testing
• Different models are different hypotheses about how the world works
• We want to evaluate the relative amount of evidence for each one
• Likelihood ratio tests and information criteria are used to compare mixed models
• We will cover them in a future workshop!

Course Recap

This concludes the mixed models in R workshop. We have learned a lot in the past two days!

• The basics of R, including variables, functions, scripts, and packages.
• How to work with data frames in R, including tidying, reshaping, and summarizing.
• How to fit linear mixed models with random intercepts and random slopes.
• How to fit mixed models with crossed and nested effects and transformed responses.
• How to fit generalized linear mixed models.
• How to compare estimated marginal means from linear mixed models

Thanks for participating!

• Link to post workshop feedback form: https://forms.gle/HkvfkUejzg5T18xcA
• Or feel free to email me your thoughts
• I’m looking forward to seeing folks at the next workshop!