+ - 0:00:00
Notes for current slide
Notes for next slide

Microbes on Earth





R session 05 - Phyloseq

Daniel Vaulot

2021-10-12

1 / 30

Outline

  • Introduction to the data
  • Read the data
  • Format and save as Phyloseq objects
  • Filter data
  • Bar graphs
  • Alpha diversity
  • Beta diversity
2 / 30

Phyloseq R library

  • Phyloseq web site

  • See in particular tutorials for

  • Get ASVs, read abundance and metadata all together

  • Filter and regroup the data

  • Bar plots, Alpha and Beta diversity computations

3 / 30

Data

This tutorial uses a reduced metabarcoding dataset obtained by C. Ribeiro and A. Lopes dos Santos. This dataset originates from the CARBOM cruise in 2013 off Brazil and corresponds to the 18S V4 region amplified on flow cytometry sorted samples (see pptx file for details) and sequenced on an Illumina run 2*250 bp analyzed with mothur.

4 / 30

Data

Reference

  • Gérikas Ribeiro, C., Dos Santos, A.L., Marie, D., Brandini, F.P. & Vaulot, D. 2018. Small eukaryotic phytoplankton communities in tropical waters off Brazil are dominated by symbioses between Haptophyta and nitrogen-fixing cyanobacteria. ISME Journal. 12:1360–74.

5 / 30

Libraries

library("phyloseq")
library("ggplot2") # graphics
library("readxl") # necessary to import the data from Excel file
library("dplyr") # filter and reformat data frames
library("tibble") # Needed for converting column to row names
6 / 30

Import and format data

Three tables are needed

  • OTU
  • Taxonomy
  • Samples

They are read from a single Excel file where each sheet contains one of the tables

7 / 30

Import and format data

OTU

  • rows = OTUs or ASVs
  • cols = samples
  • cells = number of reads
Table OTU - OTU abundance

Table OTU - OTU abundance

8 / 30

Import and format data

Taxonomy

  • rows = OTUs or ASVs
  • cols = taxonomy ranks
  • cells = taxon name
Table Taxo - OTU taxonomy

Table Taxo - OTU taxonomy

9 / 30

Import and format data

Samples

  • rows = samples
  • cols = metadata type
  • cells = metadata values
Table Samples

Table Samples

10 / 30

Import and format data

Read Excel files

otu_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "OTU matrix")
tax_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Taxonomy table")
samples_df <- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Samples")
11 / 30

Import and format data

Phyloseq objects need to have row.names

  • define the row names from the otu column
otu_mat <- otu_mat %>%
tibble::column_to_rownames("otu")
  • Idem for the two other matrixes
tax_mat <- tax_mat %>%
tibble::column_to_rownames("otu")
samples_df <- samples_df %>%
tibble::column_to_rownames("sample")
12 / 30

Import and format data

Transform to matrixes

Sample table can be left as data frame

otu_mat <- as.matrix(otu_mat)
tax_mat <- as.matrix(tax_mat)

Transform to phyloseq objects

OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
TAX = tax_table(tax_mat)
samples = sample_data(samples_df)
carbom <- phyloseq(OTU, TAX, samples)
carbom
phyloseq-class experiment-level object
otu_table() OTU Table: [ 287 taxa and 55 samples ]
sample_data() Sample Data: [ 55 samples by 27 sample variables ]
tax_table() Taxonomy Table: [ 287 taxa by 7 taxonomic ranks ]
13 / 30

Import and format data

Check data

sample_names(carbom)
[1] "X10n" "X10p" "X11n" "X11p" "X120n" "X120p" "X121n" "X121p"
[9] "X122n" "X122p" "X125n" "X125p" "X126n" "X126p" "X127n" "X13n"
[17] "X13p" "X140n" "X140p" "X141n" "X141p" "X142n" "X142p" "X155n"
[25] "X155p" "X156n" "X156p" "X157n" "X157p" "X15n" "X15p" "X165n"
[33] "X165p" "X166n" "X166p" "X167n" "X167p" "X1n" "X1p" "X2n"
[41] "X2p" "X3n" "X3p" "X5n" "X5p" "X7n" "X7p" "X9n"
[49] "X9p" "tri01n" "tri01p" "tri02n" "tri02p" "tri03n" "tri03p"
rank_names(carbom)
[1] "Domain" "Supergroup" "Division" "Class" "Order"
[6] "Family" "Genus"
sample_variables(carbom)
[1] "fraction" "Select_18S_nifH" "total_18S"
[4] "total_16S" "total_nifH" "sample_number"
[7] "transect" "station" "depth"
[10] "latitude" "longitude" "picoeuks"
[13] "nanoeuks" "bottom_depth" "level"
[16] "transect_distance" "date" "time"
[19] "phosphates" "silicates" "ammonia"
[22] "nitrates" "nitrites" "temperature"
[25] "fluorescence" "salinity" "sample_label"
14 / 30

Filter and normalize data

Keep only samples to be analyzed

carbom <- subset_samples(carbom, Select_18S_nifH =="Yes")
carbom
phyloseq-class experiment-level object
otu_table() OTU Table: [ 287 taxa and 54 samples ]
sample_data() Sample Data: [ 54 samples by 27 sample variables ]
tax_table() Taxonomy Table: [ 287 taxa by 7 taxonomic ranks ]
15 / 30

Filter and normalize data

Keep only photosynthetic taxa

carbom <- subset_taxa(carbom, Division %in% c("Chlorophyta", "Dinophyta", "Cryptophyta",
"Haptophyta", "Ochrophyta", "Cercozoa"))
carbom <- subset_taxa(carbom, !(Class %in% c("Syndiniales", "Sarcomonadea")))
carbom
phyloseq-class experiment-level object
otu_table() OTU Table: [ 205 taxa and 54 samples ]
sample_data() Sample Data: [ 54 samples by 27 sample variables ]
tax_table() Taxonomy Table: [ 205 taxa by 7 taxonomic ranks ]
16 / 30

Filter and normalize data

Normalize number of reads in each sample using median number of reads.

total = median(sample_sums(carbom))
standf = function(x, t=total) round(t * (x / sum(x)))
carbom = transform_sample_counts(carbom, standf)

The number of reads used for normalization is 44903.

17 / 30

Bar graphs

Basic bar graph based on Division

plot_bar(carbom, fill = "Division")

18 / 30

Bar graphs

Remove OTUs boundaries.

This is done by adding ggplot2 modifier.

plot_bar(carbom, fill = "Division") +
geom_bar(aes(color=Division,
fill=Division),
stat="identity",
position="stack")

19 / 30

Bar graphs

Group together Pico vs Nano samples.

carbom_fraction <- merge_samples(carbom, "fraction")
plot_bar(carbom_fraction, fill = "Division") +
geom_bar(aes(color=Division,
fill=Division),
stat="identity",
position="stack")

20 / 30

Bar graphs

  • Keep only Chlorophyta
  • Color according to genus.
  • Pico vs Nano/Surface vs Deep.
# Filter Chlorophyta
carbom_chloro_ps <- subset_taxa(carbom,
Division %in% c("Chlorophyta"))
# Transform from phylseq to dataframe
carbom_chloro_df <- psmelt(carbom_chloro_ps) %>%
# Group by fraction and level
group_by(fraction, level) %>%
# Compute relative % for each group
mutate(Abundance_pct = Abundance/sum(Abundance) * 100)
# Use ggplot directly
ggplot(carbom_chloro_df) +
geom_bar(aes(x= Division, y = Abundance_pct, fill=Genus),
stat="identity",
position="stack") +
facet_grid(rows = vars(level), cols=vars(fraction))

21 / 30

Alpha diversity

  • Chao1: richness estimator
  • Shannon: diversity estimator.

Other include: "Observed", "ACE", "Simpson", "InvSimpson", "Fisher"

plot_richness(carbom,
measures=c("Chao1", "Shannon"))

22 / 30

Alpha diversity

Regroup together samples by level and fraction.

plot_richness(carbom, measures=c("Chao1", "Shannon"),
x="level",
color="fraction")

23 / 30

Beta diversity

Ordination: NMDS Distance: Bray

carbom.ord <- ordinate(carbom,
method ="NMDS",
distance ="bray")
Square root transformation
Wisconsin double standardization
Run 0 stress 0.2317058
Run 1 stress 0.2434282
Run 2 stress 0.2518364
Run 3 stress 0.2336963
Run 4 stress 0.2610581
Run 5 stress 0.2294631
... New best solution
... Procrustes: rmse 0.09891983 max resid 0.3902268
Run 6 stress 0.2490844
Run 7 stress 0.2528563
Run 8 stress 0.2427709
Run 9 stress 0.2418987
Run 10 stress 0.2597176
Run 11 stress 0.2370944
Run 12 stress 0.2294626
... New best solution
... Procrustes: rmse 0.0005427461 max resid 0.00283235
... Similar to previous best
Run 13 stress 0.2485507
Run 14 stress 0.2479295
Run 15 stress 0.2358392
Run 16 stress 0.2385162
Run 17 stress 0.2601798
Run 18 stress 0.2578494
Run 19 stress 0.2512572
Run 20 stress 0.2503546
** Solution reached
24 / 30

Beta diversity

Plot OTUs

  • Ordination: NMDS
  • Distance: Bray
plot_ordination(carbom,
carbom.ord,
type="taxa",
color="Division",
title="OTUs")

25 / 30

Beta diversity

Plot Samples

Enlarge the points to make it more easy to read.

plot_ordination(carbom,
carbom.ord,
type="samples",
color="fraction",
shape="level",
title="Samples") +
geom_point(size=3)

26 / 30

Beta diversity

Plot both OTUs and Samples

Two different panels

plot_ordination(carbom,
carbom.ord,
type="split",
color="Division",
shape="level",
title="biplot",
label = "station") +
geom_point(size=3)

27 / 30

Beta diversity

Methods

CCA - Performs correspondence analysis, or optionally, constrained correspondence analysis (a.k.a. canonical correspondence analysis), via cca

RDA - Performs redundancy analysis, or optionally principal components analysis, via rda

NMDS - Performs Non-metric MultiDimenstional Scaling of a sample-wise ecological distance matrix onto a user-specified number of axes, k. By default, k=2, but this can be modified as a supplementary argument.

MDS/PCoA - Performs principal coordinate analysis (also called principle coordinate decomposition, multidimensional scaling (MDS), or classical scaling) of a distance matrix (Gower 1966), including two correction methods for negative eigenvalues. See pcoa for further details.

28 / 30

Beta diversity

Distances

  • manhattan
  • euclidean
  • canberra
  • bray
  • kulczynski
  • jaccard
  • gower
  • altGower
  • morisita
  • horn
  • mountford
  • raup
  • binomial
  • chao
  • cao
  • unifrac
29 / 30

What did we learn ?




  • Create file
30 / 30

What did we learn ?




  • Create file
  • Bar graphs
30 / 30

What did we learn ?




  • Create file
  • Bar graphs
  • Alpha diversity
30 / 30

What did we learn ?




  • Create file
  • Bar graphs
  • Alpha diversity
  • Beta diversity
30 / 30

Outline

  • Introduction to the data
  • Read the data
  • Format and save as Phyloseq objects
  • Filter data
  • Bar graphs
  • Alpha diversity
  • Beta diversity
2 / 30
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow