layout: true background-image: url(img/logo-course-microbe.jpg), url(img/logo_SBR.png), url(img//NTU-Logo-full-colour.png) background-position: right 10px top 20px, right 50px bottom 50px,left 50px bottom 50px, top 350px left 500px background-size: 35%, 25%, 20% # Microbes on Earth --- <br> <br> <br> <br> ## R session 03 - Data viz .font120[**Daniel Vaulot**] 2021-10-12 --- layout: false class: middle, inverse # Outline .font150[ * Graph types * Grammar of graphics * Playing with ggplot2 * Multiple graphs * ggplot2 syntax ] --- layout: false # Installation and Resources .pull-left[ ## Packages * ggplot2 * patchwork ## Download * R-session-03.zip ## Reading assignement * [Chapter 28 of R for data science](https://r4ds.had.co.nz/graphics-for-communication.html) ## Resources * [Fundamental of data visualization](https://serialmentor.com/dataviz/) * [Data visualization: practical introduction](http://socviz.co/lookatdata.html#what-makes-bad-figures-bad) ] .pull-right[ <img src="img/R_for_datascience.png" width="60%" style="display: block; margin: auto;" /> ] --- # Data vizualization <img src="img/tidy_worflow.png" width="55%" style="display: block; margin: auto;" /> -- ## Graph purposes -- .pull-left[ * **Analysis graphs** * design to see patterns, trends * aid the process of data description * interpretation] -- .pull-right[ * **Presentation graphs** * design to attract attention * make a point * illustrate a conclusion ] .font70[Source: Michael Friendly - http://datavis.ca/courses/RGraphics/] --- layout: true # Graph types --- .left-column[ ## Jitter * Two variables numerical ] -- .right-column[ <img src="img/graph_jitter.png" width="90%" style="display: block; margin: auto;" /> ] --- .left-column[ ## Bubble * Two variables numerical * **Add another variable numerical** ] .right-column[ <img src="img/graph_bubble.png" width="90%" style="display: block; margin: auto;" /> ] --- exclude: true .left-column[ ## Animate * Two variables numerical * One variable numerical * One variable categorical * **Animate another variable** ] .right-column[ <img src="img/graph_animate.gif" width="60%" style="display: block; margin: auto;" /> ] --- exclude: true ## Times series .left-column[ * Line graph ] .right-column[ <img src="img/graph_time_series.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Bargraphs .left-column[ * One variable categorical * One variable numerical ] .right-column[ <img src="img/graph_bars2.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Bargraphs .left-column[ * Rotate ] .right-column[ <img src="img/graph_bars1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Bargraphs .left-column[ * Two variable categorical * One variable numerical ] .right-column[ <img src="img/graph_bars3.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Boxplots .left-column[ * One variable categorical * One variable numerical but with many values ] .right-column[ <img src="img/graph_box.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Treemaps .left-column[ * One variable categorical * One variable numerical * Much better than pie charts ] .right-column[ <img src="img/graph_treemap.png" width="60%" style="display: block; margin: auto;" /> ] --- exclude: true ## 3D .left-column[ * Three variable numerical * Avoid unless it is a simple shape ] .right-column[ <img src="img/graph_3d.png" width="70%" style="display: block; margin: auto;" /> ] --- exclude: true ## Contours .left-column[ * Three variable numerical * Better than 3D ] .right-column[ <img src="img/graph_contour.png" width="60%" style="display: block; margin: auto;" /> ] --- background-image: url(img/graph_gallery.png) background-position: right 20px top 20px background-size: 60% ## Many... .left-column[ * Choose as a function of what you want to analyze or the story you want to tell * https://www.r-graph-gallery.com/all-graphs/ ] --- exclude: true layout: false background-image: url(img/wooclap_01.png) background-position: middle center background-size: 100% # Wooclap - Quizz on Data wrangling .font150[ https://www.wooclap.com/R01 ] --- layout: false # ggplot2 <img src="img/ggplot2.jpg" width="60%" style="display: block; margin: auto;" /> @allison_horst --- layout: true # Initialize --- ## Load necessary libraries ```r library("readxl") # Import the data from Excel file library("dplyr") # filter and reformat data frames library("ggplot2") # graphics ``` --- ## Read the data ```r samples <- readxl::read_excel("data/CARBOM data.xlsx", sheet = "Samples_boat") %>% tidyr::fill(station) ``` <table class="table table-striped table-hover table-condensed" style="font-size: 9px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> sample number </th> <th style="text-align:right;"> transect </th> <th style="text-align:left;"> station </th> <th style="text-align:left;"> date </th> <th style="text-align:left;"> time </th> <th style="text-align:right;"> depth </th> <th style="text-align:left;"> level </th> <th style="text-align:right;"> latitude </th> <th style="text-align:right;"> longitude </th> <th style="text-align:right;"> picoeuks </th> <th style="text-align:right;"> nanoeuks </th> <th style="text-align:right;"> phosphates </th> <th style="text-align:right;"> nitrates </th> <th style="text-align:right;"> temperature </th> <th style="text-align:right;"> salinity </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 10 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 81 </td> <td style="text-align:left;"> 2013-11-13 </td> <td style="text-align:left;"> 1899-12-31 01:00:00 </td> <td style="text-align:right;"> 140 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -27.42 </td> <td style="text-align:right;"> -44.72 </td> <td style="text-align:right;"> 3278 </td> <td style="text-align:right;"> 1232 </td> <td style="text-align:right;"> 0.20 </td> <td style="text-align:right;"> 0.26 </td> <td style="text-align:right;"> 17.3 </td> <td style="text-align:right;"> 35.9 </td> </tr> <tr> <td style="text-align:left;"> 11 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 85 </td> <td style="text-align:left;"> 2013-11-13 </td> <td style="text-align:left;"> 1899-12-31 13:30:00 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -26.80 </td> <td style="text-align:right;"> -45.30 </td> <td style="text-align:right;"> 16312 </td> <td style="text-align:right;"> 1615 </td> <td style="text-align:right;"> 0.29 </td> <td style="text-align:right;"> 0.22 </td> <td style="text-align:right;"> 21.3 </td> <td style="text-align:right;"> 36.5 </td> </tr> <tr> <td style="text-align:left;"> 120 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 96 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 23:50:00 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Surf </td> <td style="text-align:right;"> -27.39 </td> <td style="text-align:right;"> -47.82 </td> <td style="text-align:right;"> 1150 </td> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 0.43 </td> <td style="text-align:right;"> 0.19 </td> <td style="text-align:right;"> 23.1 </td> <td style="text-align:right;"> 33.5 </td> </tr> <tr> <td style="text-align:left;"> 121 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 96 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 23:50:00 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -27.39 </td> <td style="text-align:right;"> -47.82 </td> <td style="text-align:right;"> 1737 </td> <td style="text-align:right;"> 218 </td> <td style="text-align:right;"> 0.43 </td> <td style="text-align:right;"> 0.23 </td> <td style="text-align:right;"> 22.6 </td> <td style="text-align:right;"> 33.7 </td> </tr> <tr> <td style="text-align:left;"> 122 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 96 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 23:50:00 </td> <td style="text-align:right;"> 50 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -27.39 </td> <td style="text-align:right;"> -47.82 </td> <td style="text-align:right;"> 853 </td> <td style="text-align:right;"> 234 </td> <td style="text-align:right;"> 0.56 </td> <td style="text-align:right;"> 0.21 </td> <td style="text-align:right;"> 20.3 </td> <td style="text-align:right;"> 35.9 </td> </tr> <tr> <td style="text-align:left;"> 125 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 98 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 05:00:00 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Surf </td> <td style="text-align:right;"> -27.59 </td> <td style="text-align:right;"> -47.39 </td> <td style="text-align:right;"> 3086 </td> <td style="text-align:right;"> 1300 </td> <td style="text-align:right;"> 0.29 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 23.1 </td> <td style="text-align:right;"> 35.7 </td> </tr> <tr> <td style="text-align:left;"> 126 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 98 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 05:00:00 </td> <td style="text-align:right;"> 50 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -27.59 </td> <td style="text-align:right;"> -47.39 </td> <td style="text-align:right;"> 1217 </td> <td style="text-align:right;"> 782 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.20 </td> <td style="text-align:right;"> 23.7 </td> <td style="text-align:right;"> 37.2 </td> </tr> <tr> <td style="text-align:left;"> 127 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 98 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 05:00:00 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -27.59 </td> <td style="text-align:right;"> -47.39 </td> <td style="text-align:right;"> 3420 </td> <td style="text-align:right;"> 226 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.47 </td> <td style="text-align:right;"> 22.9 </td> <td style="text-align:right;"> 37.0 </td> </tr> <tr> <td style="text-align:left;"> 13 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 86 </td> <td style="text-align:left;"> 2013-11-13 </td> <td style="text-align:left;"> 1899-12-31 17:00:00 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:left;"> Deep </td> <td style="text-align:right;"> -26.33 </td> <td style="text-align:right;"> -45.41 </td> <td style="text-align:right;"> 6366 </td> <td style="text-align:right;"> 1007 </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:right;"> 20.9 </td> <td style="text-align:right;"> 36.3 </td> </tr> <tr> <td style="text-align:left;"> 140 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 101 </td> <td style="text-align:left;"> 2013-11-18 </td> <td style="text-align:left;"> 1899-12-31 12:00:00 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Surf </td> <td style="text-align:right;"> -27.79 </td> <td style="text-align:right;"> -46.96 </td> <td style="text-align:right;"> 500 </td> <td style="text-align:right;"> 366 </td> <td style="text-align:right;"> 0.29 </td> <td style="text-align:right;"> 0.14 </td> <td style="text-align:right;"> 23.5 </td> <td style="text-align:right;"> 36.5 </td> </tr> </tbody> </table> --- layout: true # ggplot2 --- ## A simple plot .left-code[ * Choose the data set * Choose the geometric representation * Choose the __aesthetics__ : x,y, color, shape etc... ```r ggplot(data=samples) + geom_point(mapping = aes(x=phosphates, y=nitrates)) ``` * All functions are from __ggplot2__ package unless specified ] -- .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-19-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## The grammar of graphics <img src="img/ggplot2_grammar1.png" width="50%" style="display: block; margin: auto;" /> Every graph can be described as a combination of independent building blocks: * **data**: a data frame: quantitative, categorical; local or data base query * **aes**thetic mapping of variables into visual properties: size, color, x, y * **geom**etric objects (“geom”): points, lines, areas, arrows, … * **coord**inate system (“coord”): Cartesian, log, polar, map --- .left-code[ Syntax ```r ggplot(data=samples) + geom_point(mapping = aes(x=phosphates, y=nitrates)) ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-21-1.png" width="70%" style="display: block; margin: auto;" /> ] --- .left-code[ Alternatively ```r ggplot(data=samples, mapping = aes(x=phosphates, y=nitrates)) + geom_point() ``` * If different geometries origniate from different datasets or have different mapping the datasets or the mapping must be called **inside** the geom function. ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-22-1.png" width="70%" style="display: block; margin: auto;" /> ] --- .left-code[ Alternatively ```r ggplot(samples, aes(x=phosphates, y=nitrates)) + geom_point() ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-23-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Make dot size bigger .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates)) ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-24-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Make dot size bigger .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates)) + geom_point(size=5) ``` * Add: __size=5__ outside of the aesthetics function ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-25-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Color according to depth level (discrete) .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates, color=level)) + geom_point(size=5) ``` * The mapping aesthetics must be an argument of the aes function * geom_point(**color=level**, size=5) will generate an error... ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-26-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Color according to depth (continuous) .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates, color=depth)) + geom_point(size=5) ``` * Add: __color=depth__ ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-27-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Symbol according to transect (continuous) .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates, color=depth, shape=transect)) + geom_point(size=5) ``` * Add: __shape=transect__ ] .right-plot[ ``` Error: A continuous variable can not be mapped to shape ``` <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-28-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Symbol according to transect (continuous) .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates, color=depth, shape=as.character(transect))) + geom_point(size=5) ``` * Add: __shape=as.character(transect)__ ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-29-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Panels depending on one variable .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates)) + geom_point() + facet_wrap(~ level) ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-30-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Adding a regression line .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates, color=level)) + geom_point(size=5) + geom_smooth(mapping = aes(x=phosphates, y=nitrates), method="lm") ``` * Add: __geom_smooth()__ * You can choose the type of smoothing "lm" is for linear model ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-31-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Adding a regression line .left-code[ ```r ggplot(samples, aes(x=phosphates, y=nitrates)) + geom_point(aes(color=level), size=5) + geom_smooth(mapping = aes(x=phosphates, y=nitrates), method="lm") ``` * If the mapping is in the ggplot function is for all the geom.... ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-32-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Finalizing the graph .left-code[ ```r ggplot(samples) + geom_point(mapping = aes(x=phosphates, y=nitrates, color=level), size=5) + geom_smooth(mapping = aes(x=phosphates, y=nitrates), method="lm") + xlab("Phosphates") + ylab("Nitrates") + ggtitle("CARBOM cruise") ``` * Add: __geom_smooth()__ * You can choose the type of smoothing "lm" is for linear model ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-33-1.png" width="70%" style="display: block; margin: auto;" /> ] --- layout: true # Putting several graphs together --- exclude: true ## First graph .left-code[ ```r g1 <- ggplot(samples) + geom_point(mapping = aes(x=phosphates, y=nitrates,color= level), size=5) + geom_smooth(mapping = aes(x=phosphates, y=nitrates), method="lm") + xlab("Phosphates") + ylab("Nitrates") g1 ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-34-1.png" width="70%" style="display: block; margin: auto;" /> ] --- exclude: true ## Second graph .left-code[ ```r g2<- ggplot(samples) + geom_point(mapping = aes(x=nanoeuks, y=picoeuks, color=level), size=5) + geom_smooth(mapping = aes(nanoeuks, y=picoeuks), method="lm") + xlab("Pico-eukaryotes") + ylab("Nano-eukaryotes") g2 ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-35-1.png" width="70%" style="display: block; margin: auto;" /> ] --- exclude: true ## Package patchwork * https://patchwork.data-imaginist.com/index.html .left-code[ ```r library(patchwork) (g1 + g2)/g1 ``` See also packages : * `gridExtra` * `cowplot` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-36-1.png" width="70%" style="display: block; margin: auto;" /> ] --- exclude: true ## Package patchwork .left-code[ * Adding annotation * Collecting legends ```r g1 / g2 + plot_annotation(tag_levels = 'A') + plot_layout(guides = 'collect') ``` ] .right-plot[ <img src="R-session-03-data_visualization_files/figure-html/unnamed-chunk-37-1.png" width="70%" style="display: block; margin: auto;" /> ] --- layout: true # ggplot2 syntax --- ## Anatomy of a plot <img src="img/ggplot2_anatomy.png" width="70%" style="display: block; margin: auto;" /> --- ## Geometries <img src="img/ggplot2_geom.png" width="60%" style="display: block; margin: auto;" /> --- ## Continuous x and y <img src="img/ggplot2_continuous.png" width="40%" style="display: block; margin: auto;" /> --- ## Plotting error <img src="img/ggplot2_error.png" width="60%" style="display: block; margin: auto;" /> --- ## Discrete x - Continuous y <img src="img/ggplot2_discrete.png" width="60%" style="display: block; margin: auto;" /> --- ## Continuous x <img src="img/ggplot2_one_var.png" width="50%" style="display: block; margin: auto;" /> --- exclude: true ## 3D <img src="img/ggplot2_3d.png" width="100%" style="display: block; margin: auto;" /> --- ## Modifying axis and scales <img src="img/ggplot2_scales.png" width="80%" style="display: block; margin: auto;" /> --- background-image: url(img/color_palettes.png) background-position: right 20px bottom 150px background-size: 60% ## Palettes .pull-left[ * Package tmaptools : https://github.com/mtennekes/tmaptools * Function : `palette_explorer()` * Package paletteer : https://github.com/EmilHvitfeldt/paletteer * More than 1000 palettes ] --- background-image: url(img/color_palette_viridis.png) background-position: right 20px top 100px background-size: 60% ## Palettes * Use color blind friendly palettes * viridis (e.g. scale_colour_viridis_c) --- exclude: true ## Themes <img src="img/ggplot2_themes.png" width="50%" style="display: block; margin: auto;" /> --- exclude: true layout: false # Extensions <iframe src="https://exts.ggplot2.tidyverse.org/gallery/" width="100%" height="600px" data-external="1"></iframe> --- layout: false class: inverse # Recap .font150[ - Conceptualize your graph before coding ] -- .font150[ - Decide what element is fixed and what varies ] -- .font150[ - It takes time to get what you want... ] -- .font150[ - Exploratory vs. final ] --- exclude: true layout: true # Next time: Create maps .pull-left[ ## What you will learn : * Create simple maps * Create interactive maps * Create thematic maps ## Install * rworldmap * leaflet * sf * raster * spData * tmap * ggplot2 ] .pull-right[ ## Reading list * [Geocomputation with R](https://geocompr.robinlovelace.net/) <img src="img/R-geocomputation.png" width="40%" style="display: block; margin: auto;" /> ]