# Lesson 1: R basics

## Course introduction: mixed models in R

• Follow along with lesson slides and less on text
• Each lesson has a worksheet
• Fill in the ... in the worksheet with code
• Typing in the code yourself is better than copy+paste!
• Optional exercises at the end
• This is a practical skills course, not “Principles of Statistics 101!”

## Day 1 Schedule

Time Activity
9:00-9:15 AM Introductions, troubleshooting
9:15-10:15 AM Lesson 1: R Boot Camp: the very basics
10:15-10:45 AM break
10:45-11:30 AM Lesson 2: R Boot Camp: working with data frames
11:30-11:45 AM break
11:45 AM-12:45 PM Lesson 3: From linear model to linear mixed model
12:45-2:00 PM lunch break
2:00 PM-4:00 PM office hours

(Day 2 schedule has the same format)

## Lesson 1 learning objectives

At the end of this lesson, students will …

• Know what R is and what it can do.
• Use the R console to interactively issue R commands.
• Know the most common data types in R.
• Know how statistical distributions work in R.
• Know what R packages are and how to install and load them.

## Introduction to R and RStudio

R and RStudio are software tools to help you work with and analyze your data.

### What is R?

• A statistical programming language
• Users contribute packages
• Free and open-source

### What is RStudio?

• A tool to help you write and run code in R
• RStudio is not R, it is an interface for R (you need to also have R installed to run RStudio)
• We will access RStudio through Posit Cloud for this course
• Or you can run RStudio locally if you prefer

### RStudio panes

• Console: Enter individual lines of code, see output
• Scripts: Edit and run scripts (text files containing code)
• Environment: Shows variables that you have created
• Files/Plots/Help: Includes several tabs
• Files: navigate your filesystem
• Plots: display images generated by your R code
• Packages: view and install R packages
• Help: documentation for functions and packages

### The basic moving parts of R

• variable: a structure that holds data. Examples:
• a vector of integers c(1, 2, 3)
• a character string "USDA"
• a data frame with 1000 rows and 10 columns

### The basic moving parts of R

• function: something that takes arguments as input, does something, and returns output.
• log(10): takes a numeric value as input and returns a numeric value as output
• c(1, 5, 6): The function c() takes multiple values as input and returns a vector as output.
• read.csv('myfile.csv'): takes a character string as input and returns a data frame as output.

## How to R

• Let’s start writing our first R code!
• Enter the example code in the console

### Using R as a calculator

• Use operators: +, -, *, /, ^ to use R as a calculator
2 + 3

### The assignment operator

• The assignment operator <- is used to create a new variable and give it a value. The syntax is variable <- <value>.
• Variable names can contain . or _ but can’t contain spaces or start with a number.
• You can also use = as an assignment operator but we will use <- in this workshop. Consistent code is readable code!
x <- 2 + 3
y = 3.5
• Entering the name of a variable prints that variable’s value to the console.
• If you assign a value to a new variable, nothing will print to the console. But the variable is now defined in your environment and can be used later.
x
x + y
x * 4

x <- x + 1
z <- x * 4
z

• Any line preceded by # is a comment and will not be evaluated.
# This is a comment

### Functions with arguments

• A function followed by an argument in parentheses (), like function(<value>), will input a value to a function and return some output
log(1000)

sin(pi)
• Functions can take multiple arguments separated by commas ,
• You can use either 'single quotes' or "double quotes"
my_name <- "Quentin"

paste('Hello,', my_name)

### Getting help

• Use ? to get help about a function
?paste
• Use ?? to search all help documentation for a term
??sequence

## Types of output

• Usually output prints to the console unless assigned to a variable
• Some code produces other output as a “side effect,” such as a plot
plot(mpg ~ hp, data = mtcars)

## Errors, warnings and notes

Code can produce messages instead of or in addition to output:

• Errors
• Warnings
• Notes

### Errors

• Indicates something went wrong
• No output is produced
sin(pi))

### Warnings

• Indicates the result may not be what you expected
• Code still runs and produces output
log(-5)

### Notes

• Just a note. Everything is still fine!
rep(0, 100000)

## Data types in R

• The [1] in the output from earlier indicates it is a vector of length 1
• Vectors are sequences of one or more elements of the same data type
• numeric
• character
• factor
• logical

### Numeric

• Here are two ways to make a numeric vector with a sequence of integers 1 to 100
• The first way uses a function seq() with three named arguments
• Separate arguments with ,
• The notation with : is shorthand
seq(from = 1, to = 100, by = 1)

1:100

### Character

• Text values
• Use single quotes ' or double quotes " to create character vectors
• We can index vectors with brackets [] containing one or more integer values
c('a', 'b', 'c', 'd', 'e', 'f', 'g')

letters[1:7]

letters[c(1, 18, 19)]

c('USDA', 'ARS', 'SEA')

#### Issues with numeric and character data types

• Wrong data type often results in an error
log('hello')
• Combination of numeric and character is forced to character
• This is a common problem when reading data from a spreadsheet
c(100, 5.323, 'missing value', 12)

### Factor

• Looks like character but can only contain predefined values (levels)
• Levels are sorted in a specific order
• Used for categorical variables in models
• The first level is usually considered the control or intercept in models
treatment <- factor(c('low', 'low', 'medium', 'medium', 'high', 'high'))

treatment

#### Sorting factor levels

• Default order is alphabetical
• We can sort the levels in a logical order instead of alphabetical
treatment <- factor(treatment, levels = c('low', 'medium', 'high'))

treatment

### Logical

• Can take two values, TRUE and FALSE
• The result of a comparison is a logical vector
• Logical operators in R:
• x == y: is x equal to y?
• x != y: is x not equal to y?
• x > y: is x greater than y?
• x >= y: is x greater than or equal to y?
• x < y: is x less than y?
• x <= y: is x less than or equal to y?
• x > y & x < z: is x greater than y and less than z?
• x > y | x < z: is x greater than y or less than z?

#### Examples of comparisons with logical operators

x <- 1:5

x > 4

x <= 2

x == 3

x != 2

x > 1 & x < 5

x <= 1 | x >= 5

#### The ! operator

• ! is the negation operator
• Converts all TRUE values to FALSE and vice versa.
!(x == 3)

#### The %in% operator

• %in% is an operator comparing two vectors
• Goes through the vector on the left-hand side and returns TRUE for the values that appear anywhere in the vector on the right-hand side, and FALSE otherwise
c(1, 5, 6, 7) %in% x

x %in% c(1, 5, 6, 7)

## Functions that take vectors as input

• Some functions take a vector as input and return a vector of the same length.
• exp(): the exponential of each element in the vector
set.seed(123)

random_numbers <- rnorm(n = 1000, mean = 0, sd = 1)

head(exp(random_numbers))

PROTIP: set.seed() ensures the code produces the same result each time, and head() means only print the first few values of a result

• Other functions take a vector as input and return only one or a few values
• length(), mean(), median(), and sd() return a single value.
length(random_numbers)
mean(random_numbers)
median(random_numbers)
sd(random_numbers)
• range() returns a vector of two values, the minimum and maximum of the vector
• quantile() takes two vectors as input.
• First argument is the vector we want the quantiles from
• The second vector, probs, contains the probabilities we want to calculate the quantiles for
• The function returns a vector with the same length as probs containing the percentiles
range(random_numbers)
quantile(random_numbers, probs = c(0.025, 0.5, 0.975))

## Statistical distributions

• R has a lot of built-in statistical distributions
• All of them have four functions beginning with r, d, p, and q and followed by the (abbreviated) name of the distribution.
• r: random draws from the distribution
• d: probability density function (what is the y-value of the function given x?)
• p: cumulative density function: (what is the cumulative probability given x?)
• q: quantile (what is the x-value given the cumulative probability?); q is the inverse of p.
• For example, the functions for the normal distribution are rnorm(), dnorm(), pnorm(), and qnorm()
• Default to the standard normal distribution with mean = 0 and sd = 1
• You can change those parameters by modifying the mean and sd arguments

### Other distributions you might work with

• Binomial (rbinom(), dbinom(), pbinom(), qbinom())
• Uniform (runif(), dunif(), punif(), qunif())
• Student’s t (rt(), dt(), pt(), qt())
• The list goes on …

Type ?Distributions in your console to see help documentation about all the built-in distributions.

## Common pitfalls

If you get an error or your code doesn’t work, here are some things to check.

• Punctuation: close all parentheses, brackets, and quotation marks.
(5+3))/2 # Nope

(5+3)/2 # Yep
• Spelling: are the functions and variables spelled correctly?
my_variable <- 100000

myvariable
• Spaces
• Spaces are good for making code more readable
• Compare x<-log(500,base=2) and x <- log(500, base = 2)
• But you can’t put spaces in the middle of the name of a function or variable
some_numbers <- 1:5

( some_numbers + 3 ) ^ 2

(some_numbers+3)^2

(some numbers + 3)^2
• Case: R is CASE-SENSITIVE (unlike SAS)
sum(1:10)
Sum(1:10)

## R packages

• So far we have only used code from “base R.”
• But almost any R script requires one or more packages
• Packages are sets of functions contributed by R users that are available for download on CRAN

### Installing a package

• Install a package for the first time either via the RStudio dialog or with the function install.packages()
• This only needs to be done once!
install.packages('cowsay')

PROTIP: You can specify the location of the library the package will install into. This means you can specify one that doesn’t require administrator level access.

• Load a package from the code library where packages are installed using the function library()
• This needs to be done every time you load a package!
library(cowsay)
say('USDA statisticians are the best!', by = 'cow')

-----
USDA statisticians are the best!
------
\   ^__^
\  (oo)\ ________
(__)\         )\ /\
||------w|
||      ||
• You can also use the package name followed by :: to be explicit
cowsay::say("Don't forget to close your parentheses", by = 'chicken')

-----
Don't forget to close your parentheses
------
\
\
_
_/ }
>' \
|   \
|   /'-.     .-.
\'     ';--' .'
\'.    '-./
'.-..-;
;-..'
_| _|
/ / [nosig]

• To access all the help documentation for a package, use help(package = 'packagename').

## Learning R best practices

### How do I get help?

• Google is your friend (copy and paste your error message)
• StackOverflow is your friend too
• stats.stackexchange.com if you have a question about stats that isn’t specific to R programming

### Console versus script editor

• Typing and running individual lines of code is great for exploring
• It is not as good when you are doing complex data wrangling and analysis
• You can save scripts (text files of code) to run again later
• Run individual lines or selected blocks of code from the script editor by pressing Ctrl+Enter (Win) or Cmd+Enter` (Mac)

## Hey! What about … ?

• Functions
• Lists
• Flow control (if, else, for)

Those are really important things but we aren’t going to cover them in this lesson. I strongly encourage you to explore the R resources I’ve provided to learn more. And maybe I’ll discuss them in a future workshop.

## Exercises

Go to the lesson page and try out the exercises!