layout: true background-image: url(img/logo-course-microbe.jpg), url(img/logo_SBR.png), url(img//NTU-Logo-full-colour.png) background-position: right 10px top 20px, right 50px bottom 50px,left 50px bottom 50px, top 350px left 500px background-size: 35%, 25%, 20% # Microbes on Earth --- <br> <br> <br> <br> ## R session 01 - Introduction to R .font120[**Daniel Vaulot**] 2021-10-12 <br> <br> <br> --- ## R sessions .font150[ 1 - Introduction to R 2 - Data wrangling 3 - Data visualisation 4 - Analysis of metabarcoding data ] --- layout: false class: middle, inverse # Outline .font150[ * What is R and why use R ? * Resources * Get started * Fundamentals of R - Data objects - Vectors - Operators - [Functions](#functions) - Packages ] --- background-image: url(img/wooclap_micror.png) background-position: middle center background-size: 70% # Wooclap - Past experience .font150[ https://www.wooclap.com/MICROR ] --- layout: true # Introduction --- background-image: url(img/R-logo.png) background-position: middle center background-size: 25% - .font150[For those who are experts in R] -- * please refrain to answer during this session... * help your neighbor (once COVID is gone...) -- background-image: url(img/R-logo.png) background-position: middle center background-size: 25% .font150[* Two special slide formatting] .student[Your turn...] -- .warning[Warning] --- exclude: true background-image: url(img/computer-languages.png) background-position: right 20px bottom 20px background-size: 70% ## Computer languages --- exclude: true background-image: url(img/R-logo.png) ## History of R * **Mid 1970s** - S Language for Statistical Computing conceived by John Chambers, Rick Becker, Trevor Hastie, Allan Wilks and others at Bell Labs * **Early 1990's** - R was first implemented in the early 1990’s by Robert Gentleman and Ross Ihaka, both faculty members at the University of Auckland. * **1995** - Open Source Project * **1997** - Managed by the R Core Group * **2000** - First release of R * **2011** - First release of R studio * [Historical notes - Paper from 1998](https://www.stat.auckland.ac.nz/~ihaka/downloads/Interface98.pdf) --- ## Why use R ? - **Script vs. Menu driven software (e.g. Excel)** + Can be re-rerun with new data + Reproducible workflow -- - **Open source** + Huge number of libraries + Tidy "universe" : tidyverse and ggplot2 + Very easy to manipulate tables (select columns, create new variables) - High quality graphics -- - **Work environment** - R studio -- - **Document your data processing** - R markdown - Create HTML, pdf, presentations -- - **Share your data and workflow** - GitHub --- ## What can you do with R ? -- .pull-left[ - **Science** * Statistics of course... * Data processing * Graphics * Time series analyses * Maps * Bioinformatics ] -- .pull-left[ - **But also** * Teach * Do a presentation * Write your CV * Build a web site * Write a book * Much more... ] -- .center[ <img src="img/web-site-dv.png" width="30%" style="display: block; margin: auto;" /> ] --- layout: true # Resources --- exclude: true background-image: url(img/R_nutshell.png), url(img/R_graphics_cookbook.png) background-position: right 20px top 50px, right 300px top 250px background-size: 20%, 18% ## Books and Manuals * [Applied Statistics with R](https://daviddalpiaz.github.io/appliedstats/index.html) : Quite simple introduction with emphasis on Stats * [R intro](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) : Very good introduction to R, short and clear * [R in a nutshell](http://rbasicsworkshop.weebly.com/uploads/1/8/6/0/18603232/adler_2009_r-inanutshell.pdf) : Many many receipes to solve all your questions * R graphics cook book : very good for graphics --- exclude: true background-image: url(img/web_quickR.png) background-position: right 20px top 80px background-size: 60% ## On line courses and web sites * [Coursera](https://www.coursera.org/learn/r-programming) * [Pluralsight](https://www.pluralsight.com/courses/r-programming-fundamentals) - Not free * [Quick-R, very simple](http://www.statmethods.net/) --- background-image: url(img/R_Studio-cheatsheets-01.png), url(img/R_Studio-cheatsheets-02.png) background-position: right 600px top 100px, right 20px top 20px background-size: 30%, 50% ## Cheat sheets * [R basics](http://github.com/rstudio/cheatsheets/raw/master/base-r.pdf) * [ggplot2](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf) * [dplyr](https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) --- background-image: url(img/stackoverflow.png) background-position: right 20px top 20px background-size: 40% ## Forum * https://stackoverflow.com/ * http://r-statistics.co/ * https://www.r-bloggers.com/ --- layout: true # Let's get started --- background-image: url(img/R_Studio_interface.png) background-position: right 20px top 20px background-size: 65% ## Setup * Install [R](https://cran.r-project.org/index.html) * Install [R studio](https://www.rstudio.com/products/rstudio/download/#download) --- background-image: url(img/R_Studio_interface_numbered.png) background-position: right 20px top 20px background-size: 65% ## The R studio interface .pull-left[ - **Bottom left** - Console - **Top left** - File editor for .R and .Rmd files - Data frame visualization - **Top right** - Environment (i.e. R objects) - History - **Bottom right** - Files - Plots - Packages - Help ] --- background-image: url(img/R-new-project.png) background-position: right 20px top 20px background-size: 40% ## Create a new project * Open R studio * Create new project for the course in a new directory - e.g. `Microbes course` --- ## Your first script ```r print("Hello world") ``` ``` [1] "Hello world" ``` ### Two ways to proceed 1. Type directly in command window -- 2. Create a new script .student[Type in script window * Select and execute (CTRL-R) * Source the script] --- layout: true # The R language --- ## **variables** are abstracting your data ```r > greeting = "Hello world" > print(greeting) ``` ``` [1] "Hello world" ``` -- ```r > greeting = "Bonjour" > print(greeting) ``` ``` [1] "Bonjour" ``` --- ## variables are **objects** * Assignement done with **<-** ```r > x <- 1 > y <- 2 > x + y ``` ``` [1] 3 ``` -- ```r > z <- x + y > z ``` ``` [1] 3 ``` --- **=** can be used instead of **<-** but refrain from it (not good style) ```r > z = x + y ``` -- You can view the values of the objects in R-studio environment window (top-right) <img src="img/R_studio-environment.png" width="55%" style="display: block; margin: auto;" /> --- ## R is **case sensitive** ```r > Z ``` -- ```r > Z ``` ``` Error in eval(expr, envir, enclos): objet 'Z' introuvable ``` --- ## Rules for naming objects * Use * letters * numbers * the dot * the underscore (not the minus sign !) * Start always with a letter * `Myvariable`, `Myvariable1`, `Myvariable.1`,`Myvariable-01` are OK * `1Myvariable`, `My-variable`, `Myvariable@` are **not** OK --- exclude: true ## Use consistent naming Five conventions * alllowercase: e.g. adjustcolor * period.separated: e.g. plot.new * **underscore_separated**: e.g. numeric_version * lowerCamelCase: e.g. addTaskCallback * UpperCamelCase: e.g. SignatureMethod Prefer third one, much more easy to read * Use **names** for objects : **last_name** * Use **verbs** for function : **build_name** * Think about best order - e.g. prefer maybe **name_last** because then you can have name_first, name_full... - and you identify that all these objects are related to a name... --- layout: true # R objects --- ## Data types * **character**: "Daniel", "This is a course in R", 'Joe Biden' * **numeric**: 2, 15.5, 10e-3 * **integer**: 2L (the L tells R to store this as an integer) * **date**: 2018-02-25 * **logical**: TRUE, FALSE * **complex**: 1+4i (complex numbers with real and imaginary parts) -- * **No data** "NA" * **Not a number** "NaN" (e.g. division by zero) --- ## Data structures * **Vector** * **List** * **Matrix** * **Data frames** * **Function** --- layout: true # Vectors --- The basic R structure is a vector (think as a column in Excel): `$$\begin{bmatrix}10 \\ 20 \\ 30 \end{bmatrix}$$` -- A vector can contain only a single element `$$\begin{bmatrix}10 \end{bmatrix}$$` -- ## Assign a value to a vector ```r x <- 10 x ``` ``` [1] 10 ``` --- ## Assign several elements ```r x <- c(10,20,30) x ``` ``` [1] 10 20 30 ``` -- ## Assign range ```r x <- 10:30 x ``` ``` [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` --- ## Assign characters ```r PoTU <- c("Jo", "Biden") PoTU ``` ``` [1] "Jo" "Biden" ``` -- ## Assign logical ```r flags <- c(TRUE, FALSE, TRUE) flags ``` ``` [1] TRUE FALSE TRUE ``` --- ## Access specific elements of a vector ### First ```r x[1] ``` ``` [1] 10 ``` -- ### Range ```r x[1:5] ``` ``` [1] 10 11 12 13 14 ``` -- ### Remove one element ```r x[-1] ``` ``` [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` --- exclude: true ## Determine object properties Apply functions (we will come back to functions latter) * **typeof()** - what is the object’s data type (low-level)? * **length()** - how long is it? What about two dimensional objects? ```r typeof(x) length(x) ``` -- exclude: true ``` [1] "integer" ``` ``` [1] 21 ``` -- exclude: true .student[ What is the type and length of **PoTU** ? * https://app.wooclap.com/R01 ] --- layout: true # Operators --- ## Arithmetic Operators | Operator | Description | |---|---|---| | + | addition| | - | subtraction | | * | multiplication| | / | division | | ^ or ** | exponentiation | | x %% y | modulus (x mod y) 5%%2 is 1 | | x %/% y | integer division 5%/%2 is 2 | --- ## Arithmetic Operators We are performing vector operations ! `$$\begin{bmatrix} 1\\2\\3\\..\end{bmatrix}+\begin{bmatrix}1\\2\\3\\..\end{bmatrix}=\begin{bmatrix}2\\4\\6\\..\end{bmatrix}$$` .warning[Think about it as adding 2 columns in Excel.] --- ## Arithmetic Operators Vector one element ```r x <- 1 y <- 2 z <- x + y z ``` ``` [1] 3 ``` --- ## Arithmetic Operators Vector several elements ```r # Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z ``` ``` [1] 2 4 6 8 10 12 14 16 18 ``` -- .warning[ * Several instructions on same line separate by **;** * The hastag **#** indicate a comment -> Use heavily to document your code * However, it is even better to use R markdown (look it up for next class) ] -- .student[ Use the other operators ] --- exclude: true ## Arithmetic Operators What happens when the vectors have different number of elements ? ```r x <- 1:9 y <- 1 z <- x + y z ``` -- exclude: true ``` [1] 2 3 4 5 6 7 8 9 10 ``` -- exclude: true Equivalent to ```r y <- c(1,1,1,1,1,1,1,1,1) ``` The recycling rule... --- exclude: true ## Can we add logical ? ```r x <- TRUE y <- FALSE z <- x + y z ``` -- exclude: true ``` [1] 1 ``` --- exclude: true ## Can we add logical ? No error but... The resulting variable is transformed to a **numeric** .student[ How you would show that ? ] -- exclude: true ```r typeof(x) ``` ``` [1] "logical" ``` ```r typeof(z) ``` ``` [1] "integer" ``` --- ## Logical Operators | Operator | Description | |---|---|---| | < | less than | | <= | less than or equal to | | > | greater than | | >= | greater than or equal to | | == | exactly equal to | | != | not equal to | | !x | Not x | | x | y | x OR y | | x & y | x AND y | | isTRUE(x) | test if X is TRUE | --- ## Logical Operators ```r x <- TRUE y <- FALSE z1 <- x | y z2 <- x == y ``` -- ``` [1] TRUE ``` ``` [1] FALSE ``` .warning[ Do not mix * == which is logical operator * = which is assignement ] --- exclude: true ## Can we add characters ? ```r first <- "Jo" last <- "Biden" full <- first + last ``` -- exclude: true Generates an error ``` Error in first + last: argument non numérique pour un opérateur binaire ``` -- exclude: true .student[ What can we do ? ] --- layout: true # Functions --- name: functions Functions perform specific task on objects * e.g. to concatanate strings we use **paste()** -- ```r paste(first,last) ``` ``` [1] "Jo Biden" ``` -- * Functions take **arguments** and return an object called **result** * To know the arguments use ? ```r ? paste() # Do not forget the parenthesis ``` -- .student[ What happened ? ] -- * Can go directly to Help panel and type function name --- background-image: url(img/R-help-paste-01.png) background-position: right 20px top 20px background-size: 50% ## Help --- background-image: url(img/R-help-paste-02.png) background-position: right 20px top 20px background-size: 50% ## Help --- exclude: true ## Getting what you want Let's apply paste : ```r paste(first,last) ``` ``` [1] "Jo Biden" ``` .student[ * We would like to get "Jo_Biden" * Can you read the help and suggest a change in the way we call the function ? * https://app.wooclap.com/R01 ] -- exclude: true ```r paste(first,last, sep="_") ``` ``` [1] "Jo_Biden" ``` --- ## Write your own function .warning[If you write 3 times the same piece of code, then write a function...] ```r my_sum <- function(a, b) { c <-a + b return (c) } ``` * __my_sum__ : function name * __first_number, second_number__ : arguments * instructions are enclosed by braces ({}) * return() : the value(s) returned -- exclude: true #### More compact way ```r my_sum <- function(a, b) {a + b} ``` --- ## Call your function ```r my_sum(10, 20) ``` ``` [1] 30 ``` -- * better ```r my_sum(a = 10, b = 20) ``` ``` [1] 30 ``` --- exclude: true ## Write a function to compute a product * https://app.wooclap.com/R01 --- ## Examples of functions Most of the time you do not have to write functions because someone has already written one for what you want to do... * Sum ```r x <- 1:100 sum(x) ``` ``` [1] 5050 ``` -- * Normal distribution ```r y <- rnorm(10, mean = 0, sd = 1) y ``` ``` [1] 0.18698515 -1.90285690 1.06906481 -0.55548326 0.59540370 -0.04581134 0.38981055 0.92785608 -0.81038363 -0.08401824 ``` --- ## Statistics ```r mean(y) ``` ``` [1] -0.02294331 ``` ```r sd(y) ``` ``` [1] 0.8903981 ``` -- Sample more points... 10,000 instead of 100 ```r y <- rnorm(10000, mean = 0, sd = 1) mean(y) ``` ``` [1] 0.003199685 ``` ```r sd(y) ``` ``` [1] 1.003417 ``` --- ## Plot .pull-left[ Histogram ```r library(graphics) hist(y) ``` ] .pull-right[ <img src="R-session-01-intro_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" /> ] .student[ * What is this "library()" ] --- layout: true # Packages --- Packages are set of functions that have a common goal They are really the strength of R <img src="img/R-packages-number.png" width="55%" style="display: block; margin: auto;" /> And these are only the "official"" packages. You can find more on GitHub --- ## Installing a package Download on your computer the package you need .center[ <img src="img/R_studio_package_01.png" width="45%" /><img src="img/R_studio_package_02.png" width="35%" /> ] .student[ Install package **stringr** (to manipulate strings of characters) ] --- ## Using a package To use functions from the package - use the syntax `package::function` ```r stringr::str_c(first,last, sep= " ") ``` ``` [1] "Jo Biden" ``` -- - load the package with the library function ```r library(stringr) str_c(first,last, sep= " ") ``` ``` [1] "Jo Biden" ``` -- .warning[Sometimes functions from different libraries have similar names] --- background-image: url(img/R_studio_package_03.png) background-position: right 20px top 20px background-size: 50% ## List installed packages --- layout: false class: inverse # Recap .font150[ - R is case sensitive: Z != z - Objects: data types vs data structures - Vectors: think in vector operations - Operators: arithmetic vs. logical - Functions: try to practice ] --- exclude: false layout: false background-image: url(img/R_for_datascience.png) background-position: right 20px top 20px background-size: 25% # Next: 02 - Data wrangling .font150[ * Data frames * Concept of tidy data * Reading data * Manipulating data * Selecting columns * Selecting ows ]