- Follow along with lesson slides and less on text
- Each lesson has a worksheet
- Fill in the
`...`

in the worksheet with code - Typing in the code yourself is better than copy+paste!
- Optional exercises at the end
- This is a practical skills course, not “Principles of Statistics 101!”

Time | Activity |
---|---|

9:00-9:15 AM | Introductions, troubleshooting |

9:15-10:15 AM | Lesson 1: R Boot Camp: the very basics |

10:15-10:45 AM | break |

10:45-11:30 AM | Lesson 2: R Boot Camp: working with data frames |

11:30-11:45 AM | break |

11:45 AM-12:45 PM | Lesson 3: From linear model to linear mixed model |

12:45-2:00 PM | lunch break |

2:00 PM-4:00 PM | office hours |

(Day 2 schedule has the same format)

At the end of this lesson, students will …

- Know what R is and what it can do.
- Use the R console to interactively issue R commands.
- Know the most common data types in R.
- Know how statistical distributions work in R.
- Know what R packages are and how to install and load them.

R and RStudio are software tools to help you work with and analyze your data.

- A statistical programming language
- Users contribute packages
- Free and open-source

- A tool to help you write and run code in R
- RStudio is not R, it is an interface for R (you need to also have R installed to run RStudio)
- We will access RStudio through Posit Cloud for this course
- Or you can run RStudio locally if you prefer

**Console**: Enter individual lines of code, see output**Scripts**: Edit and run scripts (text files containing code)**Environment**: Shows variables that you have created**Files/Plots/Help**: Includes several tabs*Files*: navigate your filesystem*Plots*: display images generated by your R code*Packages*: view and install R packages*Help*: documentation for functions and packages

**variable**: a structure that holds data. Examples:- a vector of integers
`c(1, 2, 3)`

- a character string
`"USDA"`

- a data frame with 1000 rows and 10 columns

- a vector of integers

**function**: something that takes arguments as input, does something, and returns output.`log(10)`

: takes a numeric value as input and returns a numeric value as output`c(1, 5, 6)`

: The function`c()`

takes multiple values as input and returns a vector as output.`read.csv('myfile.csv')`

: takes a character string as input and returns a data frame as output.

- Let’s start writing our first R code!
- Enter the example code in the console

- Use operators:
`+`

,`-`

,`*`

,`/`

,`^`

to use R as a calculator

- The assignment operator
`<-`

is used to create a new variable and give it a value. The syntax is`variable <- <value>`

. - Variable names can contain
`.`

or`_`

but can’t contain spaces or start with a number. - You can also use
`=`

as an assignment operator but we will use`<-`

in this workshop. Consistent code is readable code!

- Entering the name of a variable prints that variable’s value to the console.
- If you assign a value to a new variable, nothing will print to the console. But the variable is now defined in your environment and can be used later.

- Any line preceded by
`#`

is a comment and will not be evaluated.

- A function followed by an argument in parentheses
`()`

, like`function(<value>)`

, will input a value to a function and return some output

- Functions can take multiple arguments separated by commas
`,`

- You can use either
`'single quotes'`

or`"double quotes"`

- Use
`?`

to get help about a function

- Use
`??`

to search all help documentation for a term

- Usually output prints to the console unless assigned to a variable
- Some code produces other output as a “side effect,” such as a plot

Code can produce messages instead of or in addition to output:

- Errors
- Warnings
- Notes

- Indicates something went wrong
- No output is produced

- Indicates the result may not be what you expected
- Code still runs and produces output

- Just a note. Everything is still fine!

- The
`[1]`

in the output from earlier indicates it is a vector of length 1 - Vectors are sequences of one or more elements of the same data type
- numeric
- character
- factor
- logical

- Here are two ways to make a numeric vector with a sequence of integers 1 to 100
- The first way uses a function
`seq()`

with three named arguments - Separate arguments with
`,`

- The notation with
`:`

is shorthand

- Text values
- Use single quotes
`'`

or double quotes`"`

to create character vectors - We can index vectors with brackets
`[]`

containing one or more integer values

- Wrong data type often results in an error

- Combination of numeric and character is forced to character
- This is a common problem when reading data from a spreadsheet

- Looks like character but can only contain predefined values (levels)
- Levels are sorted in a specific order
- Used for categorical variables in models
- The first level is usually considered the control or intercept in models

- Default order is alphabetical
- We can sort the levels in a logical order instead of alphabetical

- Can take two values,
`TRUE`

and`FALSE`

- The result of a comparison is a logical vector
- Logical operators in R:
`x == y`

: is`x`

equal to`y`

?`x != y`

: is`x`

not equal to`y`

?`x > y`

: is`x`

greater than`y`

?`x >= y`

: is`x`

greater than or equal to`y`

?`x < y`

: is`x`

less than`y`

?`x <= y`

: is`x`

less than or equal to`y`

?`x > y & x < z`

: is`x`

greater than`y`

and less than`z`

?`x > y | x < z`

: is`x`

greater than`y`

or less than`z`

?

`!`

is the negation operator- Converts all
`TRUE`

values to`FALSE`

and vice versa.

`%in%`

is an operator comparing two vectors- Goes through the vector on the left-hand side and returns
`TRUE`

for the values that appear anywhere in the vector on the right-hand side, and`FALSE`

otherwise

- Some functions take a vector as input and return a vector of the same length.
`exp()`

: the exponential of each element in the vector

PROTIP:

`set.seed()`

ensures the code produces the same result each time, and`head()`

means only print the first few values of a result

- Other functions take a vector as input and return only one or a few values
`length()`

,`mean()`

,`median()`

, and`sd()`

return a single value.

`range()`

returns a vector of two values, the minimum and maximum of the vector`quantile()`

takes two vectors as input.- First argument is the vector we want the quantiles from
- The second vector,
`probs`

, contains the probabilities we want to calculate the quantiles for - The function returns a vector with the same length as
`probs`

containing the percentiles

- R has a lot of built-in statistical distributions
- All of them have four functions beginning with
`r`

,`d`

,`p`

, and`q`

and followed by the (abbreviated) name of the distribution.`r`

: random draws from the distribution`d`

: probability density function (what is the y-value of the function given x?)`p`

: cumulative density function: (what is the cumulative probability given x?)`q`

: quantile (what is the x-value given the cumulative probability?);`q`

is the inverse of`p`

.

- For example, the functions for the normal distribution are
`rnorm()`

,`dnorm()`

,`pnorm()`

, and`qnorm()`

- Default to the standard normal distribution with
`mean = 0`

and`sd = 1`

- You can change those parameters by modifying the
`mean`

and`sd`

arguments

- Binomial (
`rbinom()`

,`dbinom()`

,`pbinom()`

,`qbinom()`

) - Uniform (
`runif()`

,`dunif()`

,`punif()`

,`qunif()`

) - Student’s
*t*(`rt()`

,`dt()`

,`pt()`

,`qt()`

) - The list goes on …

Type `?Distributions`

in your console to see help documentation about all the built-in distributions.

If you get an error or your code doesn’t work, here are some things to check.

*Punctuation*: close all parentheses, brackets, and quotation marks.

*Spelling*: are the functions and variables spelled correctly?

*Spaces*- Spaces are good for making code more readable
- Compare
`x<-log(500,base=2)`

and`x <- log(500, base = 2)`

- But you can’t put spaces in the middle of the name of a function or variable

*Case*: R is CASE-SENSITIVE (unlike SAS)

- So far we have only used code from “base R.”
- But almost any R script requires one or more packages
- Packages are sets of functions contributed by R users that are available for download on CRAN

- Install a package for the first time either via the RStudio dialog or with the function
`install.packages()`

*This only needs to be done once!*

PROTIP: You can specify the location of the library the package will install into. This means you can specify one that doesn’t require administrator level access.

- Load a package from the code library where packages are installed using the function
`library()`

*This needs to be done every time you load a package!*

- You can also use the package name followed by
`::`

to be explicit

- To access all the help documentation for a package, use
`help(package = 'packagename')`

.

- Google is your friend (copy and paste your error message)
- StackOverflow is your friend too
- stats.stackexchange.com if you have a question about stats that isn’t specific to R programming

- Typing and running individual lines of code is great for exploring
- It is not as good when you are doing complex data wrangling and analysis
- You can save scripts (text files of code) to run again later
- Run individual lines or selected blocks of code from the script editor by pressing
`Ctrl+Enter`

(Win) or`Cmd+Enter`

(Mac)

- Functions
- Lists
- Flow control (if, else, for)

Those are really important things but we aren’t going to cover them in this lesson. I strongly encourage you to explore the R resources I’ve provided to learn more. And maybe I’ll discuss them in a future workshop.

Go to the lesson page and try out the exercises!