+ - 0:00:00
Notes for current slide
Notes for next slide

Microbes on Earth





R session 01 - Introduction to R

Daniel Vaulot

2021-10-12




1 / 44

Microbes on Earth

R sessions

1 - Introduction to R
2 - Data wrangling
3 - Data visualisation
4 - Analysis of metabarcoding data

2 / 44

Outline

  • What is R and why use R ?
  • Resources
  • Get started
  • Fundamentals of R
    • Data objects
    • Vectors
    • Operators
    • Functions
    • Packages
3 / 44

Wooclap - Past experience

4 / 44

Introduction

  • For those who are experts in R
5 / 44

Introduction

  • For those who are experts in R
    • please refrain to answer during this session...
    • help your neighbor (once COVID is gone...)
5 / 44

Introduction

  • For those who are experts in R
    • please refrain to answer during this session...
    • help your neighbor (once COVID is gone...)

  • Two special slide formatting

Your turn...

5 / 44

Introduction

  • For those who are experts in R
    • please refrain to answer during this session...
    • help your neighbor (once COVID is gone...)

  • Two special slide formatting

Your turn...

Warning

5 / 44

Introduction

Why use R ?

  • Script vs. Menu driven software (e.g. Excel)
    • Can be re-rerun with new data
    • Reproducible workflow
6 / 44

Introduction

Why use R ?

  • Script vs. Menu driven software (e.g. Excel)
    • Can be re-rerun with new data
    • Reproducible workflow
  • Open source
    • Huge number of libraries
    • Tidy "universe" : tidyverse and ggplot2
      • Very easy to manipulate tables (select columns, create new variables)
      • High quality graphics
6 / 44

Introduction

Why use R ?

  • Script vs. Menu driven software (e.g. Excel)
    • Can be re-rerun with new data
    • Reproducible workflow
  • Open source
    • Huge number of libraries
    • Tidy "universe" : tidyverse and ggplot2
      • Very easy to manipulate tables (select columns, create new variables)
      • High quality graphics
  • Work environment
    • R studio
6 / 44

Introduction

Why use R ?

  • Script vs. Menu driven software (e.g. Excel)
    • Can be re-rerun with new data
    • Reproducible workflow
  • Open source
    • Huge number of libraries
    • Tidy "universe" : tidyverse and ggplot2
      • Very easy to manipulate tables (select columns, create new variables)
      • High quality graphics
  • Work environment
    • R studio
  • Document your data processing
    • R markdown
    • Create HTML, pdf, presentations
6 / 44

Introduction

Why use R ?

  • Script vs. Menu driven software (e.g. Excel)
    • Can be re-rerun with new data
    • Reproducible workflow
  • Open source
    • Huge number of libraries
    • Tidy "universe" : tidyverse and ggplot2
      • Very easy to manipulate tables (select columns, create new variables)
      • High quality graphics
  • Work environment
    • R studio
  • Document your data processing
    • R markdown
    • Create HTML, pdf, presentations
  • Share your data and workflow
    • GitHub
6 / 44

Introduction

What can you do with R ?

7 / 44

Introduction

What can you do with R ?

  • Science
    • Statistics of course...
    • Data processing
    • Graphics
    • Time series analyses
    • Maps
    • Bioinformatics
7 / 44

Introduction

What can you do with R ?

  • Science
    • Statistics of course...
    • Data processing
    • Graphics
    • Time series analyses
    • Maps
    • Bioinformatics
  • But also
    • Teach
    • Do a presentation
    • Write your CV
    • Build a web site
    • Write a book
    • Much more...
7 / 44

Introduction

What can you do with R ?

  • Science
    • Statistics of course...
    • Data processing
    • Graphics
    • Time series analyses
    • Maps
    • Bioinformatics
  • But also
    • Teach
    • Do a presentation
    • Write your CV
    • Build a web site
    • Write a book
    • Much more...

7 / 44

Resources

Cheat sheets

8 / 44

Let's get started

Setup

10 / 44

Let's get started

The R studio interface

  • Bottom left
    • Console
  • Top left
    • File editor for .R and .Rmd files
    • Data frame visualization
  • Top right
    • Environment (i.e. R objects)
    • History
  • Bottom right
    • Files
    • Plots
    • Packages
    • Help
11 / 44

Let's get started

Create a new project

  • Open R studio
  • Create new project for the course in a new directory
    • e.g. Microbes course
12 / 44

Let's get started

Your first script

print("Hello world")
[1] "Hello world"

Two ways to proceed

  1. Type directly in command window
13 / 44

Let's get started

Your first script

print("Hello world")
[1] "Hello world"

Two ways to proceed

  1. Type directly in command window

  2. Create a new script

Type in script window

  • Select and execute (CTRL-R)
  • Source the script
13 / 44

The R language

variables are abstracting your data

> greeting = "Hello world"
> print(greeting)
[1] "Hello world"
14 / 44

The R language

variables are abstracting your data

> greeting = "Hello world"
> print(greeting)
[1] "Hello world"
> greeting = "Bonjour"
> print(greeting)
[1] "Bonjour"
14 / 44

The R language

variables are objects

  • Assignement done with <-
> x <- 1
> y <- 2
> x + y
[1] 3
15 / 44

The R language

variables are objects

  • Assignement done with <-
> x <- 1
> y <- 2
> x + y
[1] 3
> z <- x + y
> z
[1] 3
15 / 44

The R language

= can be used instead of <- but refrain from it (not good style)

> z = x + y
16 / 44

The R language

= can be used instead of <- but refrain from it (not good style)

> z = x + y

You can view the values of the objects in R-studio environment window (top-right)

16 / 44

The R language

R is case sensitive

> Z
17 / 44

The R language

R is case sensitive

> Z
> Z
Error in eval(expr, envir, enclos): objet 'Z' introuvable
17 / 44

The R language

Rules for naming objects

  • Use
    • letters
    • numbers
    • the dot
    • the underscore (not the minus sign !)
  • Start always with a letter
    • Myvariable, Myvariable1, Myvariable.1,Myvariable-01 are OK
    • 1Myvariable, My-variable, Myvariable@ are not OK
18 / 44

R objects

Data types

  • character: "Daniel", "This is a course in R", 'Joe Biden'

  • numeric: 2, 15.5, 10e-3

  • integer: 2L (the L tells R to store this as an integer)

  • date: 2018-02-25

  • logical: TRUE, FALSE

  • complex: 1+4i (complex numbers with real and imaginary parts)

19 / 44

R objects

Data types

  • character: "Daniel", "This is a course in R", 'Joe Biden'

  • numeric: 2, 15.5, 10e-3

  • integer: 2L (the L tells R to store this as an integer)

  • date: 2018-02-25

  • logical: TRUE, FALSE

  • complex: 1+4i (complex numbers with real and imaginary parts)

  • No data "NA"

  • Not a number "NaN" (e.g. division by zero)

19 / 44

R objects

Data structures

  • Vector

  • List

  • Matrix

  • Data frames

  • Function

20 / 44

Vectors

The basic R structure is a vector (think as a column in Excel): [102030]

21 / 44

Vectors

The basic R structure is a vector (think as a column in Excel): [102030]

A vector can contain only a single element [10]

21 / 44

Vectors

The basic R structure is a vector (think as a column in Excel): [102030]

A vector can contain only a single element [10]

Assign a value to a vector

x <- 10
x
[1] 10
21 / 44

Vectors

Assign several elements

x <- c(10,20,30)
x
[1] 10 20 30
22 / 44

Vectors

Assign several elements

x <- c(10,20,30)
x
[1] 10 20 30

Assign range

x <- 10:30
x
[1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
22 / 44

Vectors

Assign characters

PoTU <- c("Jo", "Biden")
PoTU
[1] "Jo" "Biden"
23 / 44

Vectors

Assign characters

PoTU <- c("Jo", "Biden")
PoTU
[1] "Jo" "Biden"

Assign logical

flags <- c(TRUE, FALSE, TRUE)
flags
[1] TRUE FALSE TRUE
23 / 44

Vectors

Access specific elements of a vector

First

x[1]
[1] 10
24 / 44

Vectors

Access specific elements of a vector

First

x[1]
[1] 10

Range

x[1:5]
[1] 10 11 12 13 14
24 / 44

Vectors

Access specific elements of a vector

First

x[1]
[1] 10

Range

x[1:5]
[1] 10 11 12 13 14

Remove one element

x[-1]
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
24 / 44

Operators

Arithmetic Operators

Operator Description
+ addition
- subtraction
* multiplication
/ division
^ or ** exponentiation
x %% y modulus (x mod y) 5%%2 is 1
x %/% y integer division 5%/%2 is 2
25 / 44

Operators

Arithmetic Operators

We are performing vector operations !

[123..]+[123..]=[246..]

Think about it as adding 2 columns in Excel.

26 / 44

Operators

Arithmetic Operators

Vector one element

x <- 1
y <- 2
z <- x + y
z
[1] 3
27 / 44

Operators

Arithmetic Operators

Vector several elements

# Two instructions on the same line
x <- 1:9; y <- 1:9
z <- x + y
z
[1] 2 4 6 8 10 12 14 16 18
28 / 44

Operators

Arithmetic Operators

Vector several elements

# Two instructions on the same line
x <- 1:9; y <- 1:9
z <- x + y
z
[1] 2 4 6 8 10 12 14 16 18
  • Several instructions on same line separate by ;
  • The hastag # indicate a comment -> Use heavily to document your code
  • However, it is even better to use R markdown (look it up for next class)
28 / 44

Operators

Arithmetic Operators

Vector several elements

# Two instructions on the same line
x <- 1:9; y <- 1:9
z <- x + y
z
[1] 2 4 6 8 10 12 14 16 18
  • Several instructions on same line separate by ;
  • The hastag # indicate a comment -> Use heavily to document your code
  • However, it is even better to use R markdown (look it up for next class)

Use the other operators

28 / 44

Operators

Logical Operators

Operator Description
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x | y x OR y
x & y x AND y
isTRUE(x) test if X is TRUE
29 / 44

Operators

Logical Operators

x <- TRUE
y <- FALSE
z1 <- x | y
z2 <- x == y
30 / 44

Operators

Logical Operators

x <- TRUE
y <- FALSE
z1 <- x | y
z2 <- x == y
[1] TRUE
[1] FALSE

Do not mix

  • == which is logical operator
  • = which is assignement
30 / 44

Functions

Functions perform specific task on objects

  • e.g. to concatanate strings we use paste()
31 / 44

Functions

Functions perform specific task on objects

  • e.g. to concatanate strings we use paste()
paste(first,last)
[1] "Jo Biden"
31 / 44

Functions

Functions perform specific task on objects

  • e.g. to concatanate strings we use paste()
paste(first,last)
[1] "Jo Biden"
  • Functions take arguments and return an object called result

  • To know the arguments use ?

? paste() # Do not forget the parenthesis
31 / 44

Functions

Functions perform specific task on objects

  • e.g. to concatanate strings we use paste()
paste(first,last)
[1] "Jo Biden"
  • Functions take arguments and return an object called result

  • To know the arguments use ?

? paste() # Do not forget the parenthesis

What happened ?

31 / 44

Functions

Functions perform specific task on objects

  • e.g. to concatanate strings we use paste()
paste(first,last)
[1] "Jo Biden"
  • Functions take arguments and return an object called result

  • To know the arguments use ?

? paste() # Do not forget the parenthesis

What happened ?

  • Can go directly to Help panel and type function name
31 / 44

Functions

Help

32 / 44

Functions

Help

33 / 44

Functions

Write your own function

If you write 3 times the same piece of code, then write a function...

my_sum <- function(a, b) {
c <-a + b
return (c)
}
  • my_sum : function name
  • first_number, second_number : arguments
  • instructions are enclosed by braces ({})
  • return() : the value(s) returned
34 / 44

Functions

Call your function

my_sum(10, 20)
[1] 30
35 / 44

Functions

Call your function

my_sum(10, 20)
[1] 30
  • better
my_sum(a = 10, b = 20)
[1] 30
35 / 44

Functions

Examples of functions

Most of the time you do not have to write functions because someone has already written one for what you want to do...

  • Sum
x <- 1:100
sum(x)
[1] 5050
36 / 44

Functions

Examples of functions

Most of the time you do not have to write functions because someone has already written one for what you want to do...

  • Sum
x <- 1:100
sum(x)
[1] 5050
  • Normal distribution
y <- rnorm(10, mean = 0, sd = 1)
y
[1] 0.18698515 -1.90285690 1.06906481 -0.55548326 0.59540370 -0.04581134 0.38981055 0.92785608 -0.81038363 -0.08401824
36 / 44

Functions

Statistics

mean(y)
[1] -0.02294331
sd(y)
[1] 0.8903981
37 / 44

Functions

Statistics

mean(y)
[1] -0.02294331
sd(y)
[1] 0.8903981

Sample more points... 10,000 instead of 100

y <- rnorm(10000, mean = 0, sd = 1)
mean(y)
[1] 0.003199685
sd(y)
[1] 1.003417
37 / 44

Functions

Plot

Histogram

library(graphics)
hist(y)

  • What is this "library()"
38 / 44

Packages

Packages are set of functions that have a common goal

They are really the strength of R

And these are only the "official"" packages. You can find more on GitHub

39 / 44

Packages

Installing a package

Download on your computer the package you need

Install package stringr (to manipulate strings of characters)

40 / 44

Packages

Using a package

To use functions from the package

  • use the syntax package::function
stringr::str_c(first,last, sep= " ")
[1] "Jo Biden"
41 / 44

Packages

Using a package

To use functions from the package

  • use the syntax package::function
stringr::str_c(first,last, sep= " ")
[1] "Jo Biden"
  • load the package with the library function
library(stringr)
str_c(first,last, sep= " ")
[1] "Jo Biden"
41 / 44

Packages

Using a package

To use functions from the package

  • use the syntax package::function
stringr::str_c(first,last, sep= " ")
[1] "Jo Biden"
  • load the package with the library function
library(stringr)
str_c(first,last, sep= " ")
[1] "Jo Biden"

Sometimes functions from different libraries have similar names

41 / 44

Packages

List installed packages

42 / 44

Recap

  • R is case sensitive: Z != z
  • Objects: data types vs data structures
  • Vectors: think in vector operations
  • Operators: arithmetic vs. logical
  • Functions: try to practice
43 / 44

Next: 02 - Data wrangling

  • Data frames
  • Concept of tidy data
  • Reading data
  • Manipulating data
    • Selecting columns
    • Selecting ows
44 / 44

Microbes on Earth

R sessions

1 - Introduction to R
2 - Data wrangling
3 - Data visualisation
4 - Analysis of metabarcoding data

2 / 44
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow