+ - 0:00:00
Notes for current slide
Notes for next slide

Fundamental of Data Science for EESS





R session 06 - Data mapping

Daniel Vaulot

2021-02-24

1 / 46

Outline

  • Simple features
  • Static maps
  • Plotting your data
  • Interactive map
2 / 46

Installation and Resources

Packages

New

  • sf
  • rnaturalearth
  • rnaturalearthdata
  • leaflet

Previously used

  • ggplot2
  • stringr
  • dplyr
3 / 46

What we want to do...

4 / 46

Simple Features (sf)

5 / 46

Simple Features (sf)

  • Simple features is an open standard developed and endorsed by the Open Geospatial Consortium (OGC).
  • Simple features is a hierarchical data model that represents a wide range of geometry types.

6 / 46

Simple Features (sf)

  • A point is simply a coordinate in 2D (can be also 3D or 4D space)

    • POINT (5 2)
  • A linestring is a sequence of points with a straight line connecting the points

    • LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2)
  • A polygon is a sequence of points that form a closed, non-intersecting ring.

    • POLYGON (1 5, 2 2, 4 1, 4 4, 1 5)

7 / 46

Simple Features (sf)

  • MULTIPOINT (5 2, 1 3, 3 4, 3 2)
  • MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4))
  • MULTIPOLYGON (((1 5, 2 2, 4 1, 4 4, 1 5), (0 2, 1 2, 1 3, 0 3, 0 2)))

8 / 46

Simple Features (sf)

  • A geometry collection can contain any combination of geometries including (multi)points and linestrings
    • GEOMETRYCOLLECTION (MULTIPOINT (5 2, 1 3, 3 4, 3 2), LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2))

9 / 46

Simple Features (sf)

  • Simple feature objects in R are stored in a data frame
  • Geographic data occupying a special column, usually named geom or geometry.

Three classes of sf objects

  • sfg : geometry
  • sfc : column - set of geometry plus coordinates system
  • sf : information about each geometry
10 / 46

Simple Features (sf)

sf geometry (sfg) class

library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
11 / 46

Simple Features (sf)

sf geometry (sfg) class

library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
  • Creating a point from a vector
sf::st_point(c(5, 2))
POINT (5 2)
11 / 46

Simple Features (sf)

sf geometry (sfg) class

library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
  • Creating a point from a vector
sf::st_point(c(5, 2))
POINT (5 2)
  • For line, we need to create a matrix with rbind
multipoint_matrix = rbind(c(5, 2), c(1, 3), c(3, 4), c(3, 2))
st_multipoint(multipoint_matrix)
MULTIPOINT ((5 2), (1 3), (3 4), (3 2))
11 / 46

Simple Features (sf)

sf geometry (sfg) class

  • For multiline, we need to create a list of matrices
multilinestring_list = list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2)),
rbind(c(1, 2), c(2, 4)))
st_multilinestring((multilinestring_list))
MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4))
12 / 46

Simple Features (sf)

sf geometry (sfg) class

  • For multiline, we need to create a list of matrices
multilinestring_list = list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2)),
rbind(c(1, 2), c(2, 4)))
st_multilinestring((multilinestring_list))
MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4))
  • Idem for geometry collection
geometrycollection_list = list(sf::st_multipoint(multipoint_matrix),
sf::st_multilinestring(multilinestring_list))
st_geometrycollection(geometrycollection_list)
GEOMETRYCOLLECTION (MULTIPOINT ((5 2), (1 3), (3 4), (3 2)), MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4)))
12 / 46

Simple Features (sf)

sf column (sfc) class

List of sfg obkects that also contains also information about the coordinate system

point1 = st_point(c(5, 2))
point2 = st_point(c(1, 3))
points_sfc = st_sfc(point1, point2)
points_sfc
Geometry set for 2 features
geometry type: POINT
dimension: XY
bbox: xmin: 1 ymin: 2 xmax: 5 ymax: 3
CRS: NA
POINT (5 2)
POINT (1 3)
13 / 46

Simple Features (sf)

sf column (sfc) class

Now add Coordinate Reference System (crs)

st_sfc(point1, point2, crs = 4326)
Geometry set for 2 features
geometry type: POINT
dimension: XY
bbox: xmin: 1 ymin: 2 xmax: 5 ymax: 3
geographic CRS: WGS 84
POINT (5 2)
POINT (1 3)
14 / 46

Simple Features (sf)

sf class

Add some information about the geometry.

lnd_point = st_point(c(0.1, 51.5)) # sfg object
lnd_geom = st_sfc(lnd_point, crs = 4326) # sfc object
lnd_attrib = data.frame( # data.frame object
name = "London",
temperature = 25,
date = as.Date("2017-06-21")
)
st_sf(lnd_attrib, geometry = lnd_geom)
Simple feature collection with 1 feature and 3 fields
geometry type: POINT
dimension: XY
bbox: xmin: 0.1 ymin: 51.5 xmax: 0.1 ymax: 51.5
geographic CRS: WGS 84
name temperature date geometry
1 London 25 2017-06-21 POINT (0.1 51.5)
15 / 46

Static maps

Load necessary libraries

library("readxl") # Import the data from Excel file
library("dplyr") # filter and reformat data frames
library("ggplot2") # graphics
library("sf") # simple features
library("rnaturalearth")
library("rnaturalearthdata")
16 / 46

Static maps

Read the world map data

  • scale can be 'small', 'medium', 'large'
  • returnclass can be 'sp' or 'sf' - sf (simple features) is an evolution of sp
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
17 / 46

Static maps

The world data are returned as a sf (simple feature) dataframe which contains

  • normal fields
  • one field called geometry which contains points, lines or polygon coordinate information
colnames(world)
[1] "scalerank" "featurecla" "labelrank" "sovereignt" "sov_a3"
[6] "adm0_dif" "level" "type" "admin" "adm0_a3"
[11] "geou_dif" "geounit" "gu_a3" "su_dif" "subunit"
[16] "su_a3" "brk_diff" "name" "name_long" "brk_a3"
[21] "brk_name" "brk_group" "abbrev" "postal" "formal_en"
[26] "formal_fr" "note_adm0" "note_brk" "name_sort" "name_alt"
[31] "mapcolor7" "mapcolor8" "mapcolor9" "mapcolor13" "pop_est"
[36] "gdp_md_est" "pop_year" "lastcensus" "gdp_year" "economy"
[41] "income_grp" "wikipedia" "fips_10" "iso_a2" "iso_a3"
[46] "iso_n3" "un_a3" "wb_a2" "wb_a3" "woe_id"
[51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent"
[56] "region_un" "subregion" "region_wb" "name_len" "long_len"
[61] "abbrev_len" "tiny" "homepart" "geometry"
18 / 46

Static maps

  • The field name contains the name of the countries, continent the continent, etc...
  • The field geometry contains the country outline
    name continent geometry
    Aruba North America MULTIPOLYGON (((-69.89912 1...
19 / 46

Static maps

  • The field name contains the name of the countries, continent the continent, etc...
  • The field geometry contains the country outline
    name continent geometry
    Aruba North America MULTIPOLYGON (((-69.89912 1...
world$geometry[1]
Geometry set for 1 feature
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -70.06611 ymin: 12.423 xmax: -69.8957 ymax: 12.61411
CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
19 / 46

Static maps

A simple world map

  • Use geom_sf
# Set the black and white theme
theme_set(theme_bw())
ggplot(data = world) +
geom_sf()

20 / 46

Static maps

A simple world map

  • Use coord_sf(expand = FALSE)to have the axis labels
ggplot(data = world) +
geom_sf() +
coord_sf(expand = FALSE)

21 / 46

Static maps

Color continents

  • Add color="white", fill="blue"
ggplot(data = world) +
geom_sf(color="white", fill="blue")+
coord_sf(expand = FALSE)

22 / 46

Static maps

Plot population

  • Use value of column pop_est
ggplot(data = world) +
geom_sf(aes(fill= pop_est))+
coord_sf(expand = FALSE)

23 / 46

Static maps

Plot population

  • Change color scale
  • Transform the data
ggplot(data = world) +
geom_sf(aes(fill= pop_est)) +
coord_sf(expand = FALSE) +
scale_fill_viridis_c(trans = "log10")

24 / 46

Static maps

Modify projections

ggplot(data = world) +
geom_sf() +
coord_sf(expand = FALSE,
crs = "+proj=laea +lat_0=90 +lon_0=0 ")

25 / 46

Static maps

Zoom

  • Centered on Singapore
ggplot(data = world) +
geom_sf() +
coord_sf(xlim=c(90,110),
ylim=c(-10,10))

Resolution is not optimal

26 / 46

Static maps

Add country names

  • Must first define coordinates centroid of each country (X and Y)
world_centroids<- st_coordinates(st_centroid(world$geometry))
27 / 46

Static maps

Add country names

  • Must first define coordinates centroid of each country (X and Y)
world_centroids<- st_coordinates(st_centroid(world$geometry))
  • Bind the 2 tables
world_centroids <- cbind(world, world_centroids)
name X Y geometry
Aruba -69.98268 12.52088 MULTIPOLYGON (((-69.89912 1...
27 / 46

Static maps

Add country names

  • Use geom_text()
ggplot(data = world_centroids) +
geom_sf() +
coord_sf(xlim=c(70,130),
ylim=c(-20,20)) +
geom_text(aes(x = X, y=Y, label=name),
size=5) +
xlab("Longitude") +
ylab("Latitude")

28 / 46

Plot data

Read the data

bolido <- readxl::read_excel("data/bolido.xlsx")
cruise station_id date depth_level depth latitude longitude oceanic_region temperature salinity reads_total bolido_reads bolido_pct species_dominant
MALINA 430 2009-08-18 surface 3 71.1300 -136.420 Beaufort Sea -0.804 25.935 10495 4 0.0381134 Parmales_env_3B_sp.
MALINA 460 2009-08-19 surface 4 70.4000 -136.030 Beaufort Sea 0.076 25.384 9591 4 0.0417058 Parmales_env_3B_sp.
MALINA 540 2009-08-17 surface 3 70.4500 -137.530 Beaufort Sea -0.610 26.246 9900 14 0.1414141 Parmales_env_3B_sp.
MALINA 620 2009-08-11 surface 3 70.6800 -139.627 Beaufort Sea 1.231 22.522 7557 11 0.1455604 Parmales_env_3B_sp.
MALINA 670 2009-08-10 surface 3 69.7980 -138.441 Beaufort Sea 3.740 23.560 8348 22 0.2635362 Parmales_env_3B_sp.
MALINA 760 2009-08-12 surface 3 70.3300 -140.470 Beaufort Sea 0.511 22.455 8319 6 0.0721241 Parmales_env_3A_sp.
CASES CA15 2003-10-01 surface 29 71.5367 -126.955 Arctic Ocean -0.780 30.620 12354 211 1.7079488 Parmales_env_1_X_sp.
CASES 200 2003-11-01 surface 35 69.9290 -126.489 Arctic Ocean -1.460 31.500 12797 544 4.2509963 Parmales_env_1_X_sp.
ArcticNet CA18 2005-09-01 surface 32 70.6663 -122.993 Arctic Ocean -1.040 32.040 9531 43 0.4511594 Triparma_laevis_clade
CFL 405 2007-11-01 surface 10 70.6217 -123.001 Arctic Ocean -1.650 30.130 13471 20 0.1484671 Parmales_env_1_X_sp.
29 / 46

Plot data

A bit of biology... Bolidophyceae and Parmales

Triparma

  • Covered with silica

Bolidomonas

  • 1.5 µm

Guillou, L., Chrétiennot-Dinet, M.-J., Medlin, L.K., Claustre, H., Loiseaux-de Goër, S. & Vaulot, D. 1999. J. Phycol. 35:368–81.
Ichinomiya, M., Yoshikawa, S., Kamiya, M., Ohki, K., Takaichi, S. & Kuwata, A. 2011. J. Phycol. 47:144–51.
Ichinomiya, M., dos Santos, A.L., Gourvil, P., Yoshikawa, S., Kamiya, M., Ohki, K., Audic, S. et al. 2016. ISME J. 10:2419–34.

30 / 46

Plot data

Plot the stations where Bolidophyceae found

  • Filter by to remove stations where Bolido are not present
ggplot(data = world) +
geom_sf()+
coord_sf(expand = FALSE) +
geom_point(data=filter(bolido,
bolido_pct >0),
aes(x=longitude,
y=latitude),
size=3)

31 / 46

Plot data

Percentage of Bolidophyceae

  • Filter by to remove stations where Bolido are not present
ggplot(data = world) +
geom_sf()+
coord_sf(expand = FALSE) +
geom_point(data=filter(bolido,
bolido_pct >0),
aes(x=longitude,
y=latitude,
size=bolido_pct),
color="blue")

32 / 46

Plot data

Dominant species of Triparma

  • Color with species
  • Put species legend at bottom on 2 rows
ggplot(data = world) +
geom_sf()+
coord_sf(expand = FALSE) +
geom_point(data=filter(bolido,
bolido_pct >0,
stringr::str_detect(species_dominant, "Triparma")),
aes(x=longitude,
y=latitude,
color=species_dominant),
size = 4) +
scale_colour_viridis_d() +
theme(legend.position="bottom") +
guides(color=guide_legend(nrow=2,
byrow=TRUE))

33 / 46

Interactive maps

Leaflet

library("leaflet")

34 / 46

Interactive maps

Create an empty map

leaflet(width = 750, height = 500)
35 / 46

Interactive maps

Add background (tiles)

leaflet(width = 750, height = 500)%>%
addTiles()
36 / 46

Interactive maps

Set limits and zoom

leaflet(width = 750, height = 500)%>%
addTiles() %>%
setView(lng=0, lat=0, zoom=2)
37 / 46

Interactive maps

Adding data points

leaflet(width = 750, height = 500)%>%
addTiles() %>%
setView(lng=0, lat=0, zoom=2) %>%
addCircleMarkers(data = bolido,
lat = ~ latitude,
lng = ~ longitude,
radius = 5,
label = ~ station_id,
labelOptions = labelOptions(textsize = "10px",
noHide = T),
clusterOptions = markerClusterOptions())
2
4
2
3
9
27
61
3
15
3
11
3
2
2
19
12
147
124
311
30
133
7
105
38 / 46

Interactive maps

Adding background

leaflet(width = 750, height = 500)%>%
addTiles() %>%
setView(lng=0, lat=0, zoom=2) %>%
addCircleMarkers(data = bolido,
lat = ~ latitude,
lng = ~ longitude,
radius = 5,
label = ~ station_id,
labelOptions = labelOptions(textsize = "10px", noHide = T),
clusterOptions = markerClusterOptions()) %>%
addProviderTiles(providers$Esri.OceanBasemap)
2
4
2
3
9
27
61
3
15
3
11
3
2
2
19
12
147
124
311
30
133
7
105
Leaflet | © OpenStreetMap contributors, CC-BY-SA, Tiles © Esri — Sources: GEBCO, NOAA, CHS, OSU, UNH, CSUMB, National Geographic, DeLorme, NAVTEQ, and Esri
39 / 46

Your turn

bathy_0 <- read_sf("data/ne_10m_bathymetry_L_0.shp")
bathy_200 <- read_sf("data/ne_10m_bathymetry_K_200.shp")

42 / 46

Your turn

exclude = true

# Very tricky must use the Windows file convention (backslash and not slash) !!!
bathy_0 <- read_sf("data/ne_10m_bathymetry_L_0.shp")
bathy_200 <- read_sf("data/ne_10m_bathymetry_K_200.shp")
theme_set(theme_bw())
ggplot() +
geom_sf(data = world, fill="grey90", color="grey90") +
geom_sf(data = bathy_0, fill="deepskyblue", color="deepskyblue") +
geom_sf(data=bathy_200, fill="white", color="white")+
coord_sf(xlim=c(-20,40),
ylim=c(30,70)) +
geom_point(data=filter(bolido,
bolido_pct >0),
aes(x=longitude,
y=latitude,
size=bolido_pct),
color="red")+
geom_text(data = world_centroids, aes(x = X, y=Y, label=name),
size=3) +
xlab("Longitude") + ylab("Latitude") + labs(size = "Bolido %")
43 / 46

Recap

  • Simple features
  • Static maps
  • Plotting your data
  • Interactive map
44 / 46

Topics to explore on your own

  • Strings and date manipulation
  • Text mining
  • Web mining
  • Markdown: books, web sites, presentations
  • Creating Packages
  • Interface to Databases (MySQL)
45 / 46

C'est fini...

46 / 46

Outline

  • Simple features
  • Static maps
  • Plotting your data
  • Interactive map
2 / 46
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow