Daniel Vaulot
2021-10-12
See in particular tutorials for
Get ASVs, read abundance and metadata all together
Filter and regroup the data
Bar plots, Alpha and Beta diversity computations
This tutorial uses a reduced metabarcoding dataset obtained by C. Ribeiro and A. Lopes dos Santos. This dataset originates from the CARBOM cruise in 2013 off Brazil and corresponds to the 18S V4 region amplified on flow cytometry sorted samples (see pptx file for details) and sequenced on an Illumina run 2*250 bp analyzed with mothur.
Reference
library("phyloseq")library("ggplot2") # graphicslibrary("readxl") # necessary to import the data from Excel filelibrary("dplyr") # filter and reformat data frameslibrary("tibble") # Needed for converting column to row names
Three tables are needed
They are read from a single Excel file where each sheet contains one of the tables
Table OTU - OTU abundance
Table Taxo - OTU taxonomy
Table Samples
otu_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "OTU matrix") tax_mat<- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Taxonomy table") samples_df <- read_excel("../data/CARBOM metabarcodes.xlsx", sheet = "Samples")
otu_mat <- otu_mat %>% tibble::column_to_rownames("otu")
tax_mat <- tax_mat %>% tibble::column_to_rownames("otu") samples_df <- samples_df %>% tibble::column_to_rownames("sample")
Sample table can be left as data frame
otu_mat <- as.matrix(otu_mat) tax_mat <- as.matrix(tax_mat)
OTU = otu_table(otu_mat, taxa_are_rows = TRUE) TAX = tax_table(tax_mat) samples = sample_data(samples_df) carbom <- phyloseq(OTU, TAX, samples) carbom
phyloseq-class experiment-level objectotu_table() OTU Table: [ 287 taxa and 55 samples ]sample_data() Sample Data: [ 55 samples by 27 sample variables ]tax_table() Taxonomy Table: [ 287 taxa by 7 taxonomic ranks ]
sample_names(carbom)
[1] "X10n" "X10p" "X11n" "X11p" "X120n" "X120p" "X121n" "X121p" [9] "X122n" "X122p" "X125n" "X125p" "X126n" "X126p" "X127n" "X13n" [17] "X13p" "X140n" "X140p" "X141n" "X141p" "X142n" "X142p" "X155n" [25] "X155p" "X156n" "X156p" "X157n" "X157p" "X15n" "X15p" "X165n" [33] "X165p" "X166n" "X166p" "X167n" "X167p" "X1n" "X1p" "X2n" [41] "X2p" "X3n" "X3p" "X5n" "X5p" "X7n" "X7p" "X9n" [49] "X9p" "tri01n" "tri01p" "tri02n" "tri02p" "tri03n" "tri03p"
rank_names(carbom)
[1] "Domain" "Supergroup" "Division" "Class" "Order" [6] "Family" "Genus"
sample_variables(carbom)
[1] "fraction" "Select_18S_nifH" "total_18S" [4] "total_16S" "total_nifH" "sample_number" [7] "transect" "station" "depth" [10] "latitude" "longitude" "picoeuks" [13] "nanoeuks" "bottom_depth" "level" [16] "transect_distance" "date" "time" [19] "phosphates" "silicates" "ammonia" [22] "nitrates" "nitrites" "temperature" [25] "fluorescence" "salinity" "sample_label"
carbom <- subset_samples(carbom, Select_18S_nifH =="Yes") carbom
phyloseq-class experiment-level objectotu_table() OTU Table: [ 287 taxa and 54 samples ]sample_data() Sample Data: [ 54 samples by 27 sample variables ]tax_table() Taxonomy Table: [ 287 taxa by 7 taxonomic ranks ]
carbom <- subset_taxa(carbom, Division %in% c("Chlorophyta", "Dinophyta", "Cryptophyta", "Haptophyta", "Ochrophyta", "Cercozoa")) carbom <- subset_taxa(carbom, !(Class %in% c("Syndiniales", "Sarcomonadea"))) carbom
phyloseq-class experiment-level objectotu_table() OTU Table: [ 205 taxa and 54 samples ]sample_data() Sample Data: [ 54 samples by 27 sample variables ]tax_table() Taxonomy Table: [ 205 taxa by 7 taxonomic ranks ]
Normalize number of reads in each sample using median number of reads.
total = median(sample_sums(carbom)) standf = function(x, t=total) round(t * (x / sum(x))) carbom = transform_sample_counts(carbom, standf)
The number of reads used for normalization is 44903.
Basic bar graph based on Division
plot_bar(carbom, fill = "Division")
Remove OTUs boundaries.
This is done by adding ggplot2 modifier.
plot_bar(carbom, fill = "Division") + geom_bar(aes(color=Division, fill=Division), stat="identity", position="stack")
Group together Pico vs Nano samples.
carbom_fraction <- merge_samples(carbom, "fraction")plot_bar(carbom_fraction, fill = "Division") + geom_bar(aes(color=Division, fill=Division), stat="identity", position="stack")
# Filter Chlorophyta carbom_chloro_ps <- subset_taxa(carbom, Division %in% c("Chlorophyta"))# Transform from phylseq to dataframe carbom_chloro_df <- psmelt(carbom_chloro_ps) %>% # Group by fraction and level group_by(fraction, level) %>%# Compute relative % for each group mutate(Abundance_pct = Abundance/sum(Abundance) * 100) # Use ggplot directly ggplot(carbom_chloro_df) + geom_bar(aes(x= Division, y = Abundance_pct, fill=Genus), stat="identity", position="stack") + facet_grid(rows = vars(level), cols=vars(fraction))
Other include: "Observed", "ACE", "Simpson", "InvSimpson", "Fisher"
plot_richness(carbom, measures=c("Chao1", "Shannon"))
Regroup together samples by level and fraction.
plot_richness(carbom, measures=c("Chao1", "Shannon"), x="level", color="fraction")
Ordination: NMDS Distance: Bray
carbom.ord <- ordinate(carbom, method ="NMDS", distance ="bray")
Square root transformationWisconsin double standardizationRun 0 stress 0.2317058 Run 1 stress 0.2434282 Run 2 stress 0.2518364 Run 3 stress 0.2336963 Run 4 stress 0.2610581 Run 5 stress 0.2294631 ... New best solution... Procrustes: rmse 0.09891983 max resid 0.3902268 Run 6 stress 0.2490844 Run 7 stress 0.2528563 Run 8 stress 0.2427709 Run 9 stress 0.2418987 Run 10 stress 0.2597176 Run 11 stress 0.2370944 Run 12 stress 0.2294626 ... New best solution... Procrustes: rmse 0.0005427461 max resid 0.00283235 ... Similar to previous bestRun 13 stress 0.2485507 Run 14 stress 0.2479295 Run 15 stress 0.2358392 Run 16 stress 0.2385162 Run 17 stress 0.2601798 Run 18 stress 0.2578494 Run 19 stress 0.2512572 Run 20 stress 0.2503546 ** Solution reached
plot_ordination(carbom, carbom.ord, type="taxa", color="Division", title="OTUs")
Enlarge the points to make it more easy to read.
plot_ordination(carbom, carbom.ord, type="samples", color="fraction", shape="level", title="Samples") + geom_point(size=3)
Two different panels
plot_ordination(carbom, carbom.ord, type="split", color="Division", shape="level", title="biplot", label = "station") + geom_point(size=3)
CCA - Performs correspondence analysis, or optionally, constrained correspondence analysis (a.k.a. canonical correspondence analysis), via cca
RDA - Performs redundancy analysis, or optionally principal components analysis, via rda
NMDS - Performs Non-metric MultiDimenstional Scaling of a sample-wise ecological distance matrix onto a user-specified number of axes, k. By default, k=2, but this can be modified as a supplementary argument.
MDS/PCoA - Performs principal coordinate analysis (also called principle coordinate decomposition, multidimensional scaling (MDS), or classical scaling) of a distance matrix (Gower 1966), including two correction methods for negative eigenvalues. See pcoa for further details.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |