Microbes on Earth

R session 05 - Phyloseq

Daniel Vaulot

2021-10-12

1 / 30

OutlineIntroduction to the data
Read the data
Format and save as Phyloseq objects
Filter data
Bar graphs
Alpha diversity
Beta diversity

2 / 30

Phyloseq R library

Phyloseq web site
See in particular tutorials for
- importing data
- bar plots
Get ASVs, read abundance and metadata all together
Filter and regroup the data
Bar plots, Alpha and Beta diversity computations

3 / 30

Data

This tutorial uses a reduced metabarcoding dataset obtained by C. Ribeiro and A. Lopes dos Santos. This dataset originates from the CARBOM cruise in 2013 off Brazil and corresponds to the 18S V4 region amplified on flow cytometry sorted samples (see pptx file for details) and sequenced on an Illumina run 2*250 bp analyzed with mothur.

4 / 30

Data

Reference

Gérikas Ribeiro, C., Dos Santos, A.L., Marie, D., Brandini, F.P. & Vaulot, D. 2018. Small eukaryotic phytoplankton communities in tropical waters off Brazil are dominated by symbioses between Haptophyta and nitrogen-fixing cyanobacteria. ISME Journal. 12:1360–74.

5 / 30

Libraries

library("phyloseq")
library("ggplot2")      # graphics
library("readxl")       # necessary to import the data from Excel file
library("dplyr")        # filter and reformat data frames
library("tibble")       # Needed for converting column to row names

6 / 30

Import and format data

Three tables are needed

OTU
Taxonomy
Samples

They are read from a single Excel file where each sheet contains one of the tables

7 / 30

Import and format data

OTU

rows = OTUs or ASVs
cols = samples
cells = number of reads

Table OTU - OTU abundance

8 / 30

Import and format data

Taxonomy

rows = OTUs or ASVs
cols = taxonomy ranks
cells = taxon name

Table Taxo - OTU taxonomy

9 / 30

Import and format data

Samples

rows = samples
cols = metadata type
cells = metadata values

Table Samples

10 / 30

Import and format data

Read Excel files

  otu_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "OTU matrix")
  tax_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Taxonomy table")
  samples_df <- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Samples")

11 / 30

Import and format data

Phyloseq objects need to have row.names

define the row names from the otu column

  otu_mat <- otu_mat %>%
    tibble::column_to_rownames("otu")

Idem for the two other matrixes

  tax_mat <- tax_mat %>% 
    tibble::column_to_rownames("otu")
  samples_df <- samples_df %>% 
    tibble::column_to_rownames("sample")

12 / 30

Import and format data

Transform to matrixes

Sample table can be left as data frame

  otu_mat <- as.matrix(otu_mat)
  tax_mat <- as.matrix(tax_mat)

Transform to phyloseq objects

  OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
  TAX = tax_table(tax_mat)
  samples = sample_data(samples_df)
  carbom <- phyloseq(OTU, TAX, samples)
  carbom

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 287 taxa and 55 samples ]
sample_data() Sample Data:       [ 55 samples by 27 sample variables ]
tax_table()   Taxonomy Table:    [ 287 taxa by 7 taxonomic ranks ]

13 / 30

Import and format data

Check data

  sample_names(carbom)

 [1] "X10n"   "X10p"   "X11n"   "X11p"   "X120n"  "X120p"  "X121n"  "X121p" 
 [9] "X122n"  "X122p"  "X125n"  "X125p"  "X126n"  "X126p"  "X127n"  "X13n"  
[17] "X13p"   "X140n"  "X140p"  "X141n"  "X141p"  "X142n"  "X142p"  "X155n" 
[25] "X155p"  "X156n"  "X156p"  "X157n"  "X157p"  "X15n"   "X15p"   "X165n" 
[33] "X165p"  "X166n"  "X166p"  "X167n"  "X167p"  "X1n"    "X1p"    "X2n"   
[41] "X2p"    "X3n"    "X3p"    "X5n"    "X5p"    "X7n"    "X7p"    "X9n"   
[49] "X9p"    "tri01n" "tri01p" "tri02n" "tri02p" "tri03n" "tri03p"

  rank_names(carbom)

[1] "Domain"     "Supergroup" "Division"   "Class"      "Order"     
[6] "Family"     "Genus"

  sample_variables(carbom)

 [1] "fraction"          "Select_18S_nifH"   "total_18S"        
 [4] "total_16S"         "total_nifH"        "sample_number"    
 [7] "transect"          "station"           "depth"            
[10] "latitude"          "longitude"         "picoeuks"         
[13] "nanoeuks"          "bottom_depth"      "level"            
[16] "transect_distance" "date"              "time"             
[19] "phosphates"        "silicates"         "ammonia"          
[22] "nitrates"          "nitrites"          "temperature"      
[25] "fluorescence"      "salinity"          "sample_label"

14 / 30

Filter and normalize data

Keep only samples to be analyzed

  carbom <- subset_samples(carbom, Select_18S_nifH =="Yes")
  carbom

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 287 taxa and 54 samples ]
sample_data() Sample Data:       [ 54 samples by 27 sample variables ]
tax_table()   Taxonomy Table:    [ 287 taxa by 7 taxonomic ranks ]

15 / 30

Filter and normalize data

Keep only photosynthetic taxa

  carbom <- subset_taxa(carbom, Division %in% c("Chlorophyta", "Dinophyta", "Cryptophyta", 
                                                "Haptophyta", "Ochrophyta", "Cercozoa"))
  carbom <- subset_taxa(carbom, !(Class %in% c("Syndiniales", "Sarcomonadea")))
  carbom

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 205 taxa and 54 samples ]
sample_data() Sample Data:       [ 54 samples by 27 sample variables ]
tax_table()   Taxonomy Table:    [ 205 taxa by 7 taxonomic ranks ]

16 / 30

Filter and normalize data

Normalize number of reads in each sample using median number of reads.

  total = median(sample_sums(carbom))
  standf = function(x, t=total) round(t * (x / sum(x)))
  carbom = transform_sample_counts(carbom, standf)

The number of reads used for normalization is 44903.

17 / 30

Bar graphs

Basic bar graph based on Division

plot_bar(carbom, fill = "Division")

18 / 30

Bar graphs

Remove OTUs boundaries.

This is done by adding ggplot2 modifier.

plot_bar(carbom, fill = "Division") + 
 geom_bar(aes(color=Division, 
              fill=Division), 
          stat="identity", 
          position="stack")

19 / 30

Bar graphs

Group together Pico vs Nano samples.

carbom_fraction <- merge_samples(carbom, "fraction")
plot_bar(carbom_fraction, fill = "Division") + 
  geom_bar(aes(color=Division, 
               fill=Division), 
           stat="identity", 
           position="stack")

20 / 30

Bar graphs

Keep only Chlorophyta
Color according to genus.
Pico vs Nano/Surface vs Deep.

# Filter Chlorophyta
  carbom_chloro_ps <- subset_taxa(carbom, 
                               Division %in% c("Chlorophyta"))
# Transform from phylseq to dataframe
  carbom_chloro_df <- psmelt(carbom_chloro_ps) %>% 
# Group by fraction and level
   group_by(fraction, level) %>%
# Compute relative % for each group
   mutate(Abundance_pct = Abundance/sum(Abundance) * 100) 
# Use ggplot directly 
  ggplot(carbom_chloro_df)  +
    geom_bar(aes(x= Division, y = Abundance_pct, fill=Genus), 
           stat="identity", 
           position="stack") +
    facet_grid(rows = vars(level), cols=vars(fraction))

21 / 30

Alpha diversity

Chao1: richness estimator
Shannon: diversity estimator.

Other include: "Observed", "ACE", "Simpson", "InvSimpson", "Fisher"

plot_richness(carbom, 
              measures=c("Chao1", "Shannon"))

22 / 30

Alpha diversity

Regroup together samples by level and fraction.

  plot_richness(carbom, measures=c("Chao1", "Shannon"), 
                x="level", 
                color="fraction")

23 / 30

Beta diversity

Ordination: NMDS Distance: Bray

carbom.ord <- ordinate(carbom, 
                       method ="NMDS", 
                       distance ="bray")

Square root transformation
Wisconsin double standardization
Run 0 stress 0.2317058 
Run 1 stress 0.2434282 
Run 2 stress 0.2518364 
Run 3 stress 0.2336963 
Run 4 stress 0.2610581 
Run 5 stress 0.2294631 
... New best solution
... Procrustes: rmse 0.09891983  max resid 0.3902268 
Run 6 stress 0.2490844 
Run 7 stress 0.2528563 
Run 8 stress 0.2427709 
Run 9 stress 0.2418987 
Run 10 stress 0.2597176 
Run 11 stress 0.2370944 
Run 12 stress 0.2294626 
... New best solution
... Procrustes: rmse 0.0005427461  max resid 0.00283235 
... Similar to previous best
Run 13 stress 0.2485507 
Run 14 stress 0.2479295 
Run 15 stress 0.2358392 
Run 16 stress 0.2385162 
Run 17 stress 0.2601798 
Run 18 stress 0.2578494 
Run 19 stress 0.2512572 
Run 20 stress 0.2503546 
** Solution reached

24 / 30

Beta diversity

Plot OTUs

Ordination: NMDS
Distance: Bray

plot_ordination(carbom, 
                carbom.ord, 
                type="taxa", 
                color="Division", 
                title="OTUs")

25 / 30

Beta diversity

Plot Samples

Enlarge the points to make it more easy to read.

plot_ordination(carbom, 
                carbom.ord, 
                type="samples", 
                color="fraction", 
                shape="level", 
                title="Samples") + 
  geom_point(size=3)

26 / 30

Beta diversity

Plot both OTUs and Samples

Two different panels

plot_ordination(carbom, 
                carbom.ord, 
                type="split", 
                color="Division", 
                shape="level", 
                title="biplot", 
                label = "station") +  
  geom_point(size=3)

27 / 30

Beta diversity

Methods

CCA - Performs correspondence analysis, or optionally, constrained correspondence analysis (a.k.a. canonical correspondence analysis), via cca

RDA - Performs redundancy analysis, or optionally principal components analysis, via rda

NMDS - Performs Non-metric MultiDimenstional Scaling of a sample-wise ecological distance matrix onto a user-specified number of axes, k. By default, k=2, but this can be modified as a supplementary argument.

MDS/PCoA - Performs principal coordinate analysis (also called principle coordinate decomposition, multidimensional scaling (MDS), or classical scaling) of a distance matrix (Gower 1966), including two correction methods for negative eigenvalues. See pcoa for further details.

28 / 30

Beta diversityDistancesmanhattan
euclidean
canberra
bray
kulczynski
jaccard
gower
altGower
morisita  
horn
mountford
raup
binomial
chao
cao
unifrac
29 / 30

What did we learn ?

Create file

30 / 30

What did we learn ?

Create file

Bar graphs

30 / 30

What did we learn ?

Create file

Bar graphs

Alpha diversity

30 / 30

What did we learn ?

Create file

Bar graphs

Alpha diversity

Beta diversity

30 / 30

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help