r_dada2_tutorial

R course

Daniel Vaulot

2023-01-18

05 - Metabarcode processing with dada2

Downloads

Install the following software :

R
R studio

Download and install the following libraries by running under R studio the following lines

install.packages("readr")     # To read and write files
install.packages("readxl")    # To read excel files

install.packages("dplyr")     # To manipulate dataframes
install.packages("tibble")    # To work with data frames
install.packages("tidyr")     # To work with data frames

install.packages("stringr")   # To manipulate strings

install.packages("ggplot2")   # To do plots


if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biobase")
BiocManager::install("Biostrings")
BiocManager::install("dada2")
BiocManager::install("phyloseq")

	reads.in	reads.out
120p_S39_R1.subsample.fastq	1000	256
121p_S57_R1.subsample.fastq	1000	457
122p_S4_R1.subsample.fastq	1000	407
125p_S22_R1.subsample.fastq	1000	553
126p_S40_R1.subsample.fastq	1000	508
140p_S5_R1.subsample.fastq	1000	456
141p_S23_R1.subsample.fastq	1000	473
142p_S41_R1.subsample.fastq	1000	583
155p_S59_R1.subsample.fastq	1000	528
156p_S6_R1.subsample.fastq	1000	530
157p_S24_R1.subsample.fastq	1000	513
165p_S42_R1.subsample.fastq	1000	521
166p_S60_R1.subsample.fastq	1000	519
167p_S7_R1.subsample.fastq	1000	572

sequence	abundance	forward	reverse	nmatch	prefer	accept
AGCTCCAATAGCGTATATTA...	146	1	1	71	1	TRUE
CACACGTCTAATGTTGCATT...	64	2	2	131	1	TRUE
AGCTCCAATAGCGTATACTA...	13	3	3	72	1	TRUE

	input	filtered	denoised	merged	tabled	nonchim
120p	1000	256	241	223	223	223
121p	1000	457	446	397	397	397
122p	1000	407	396	357	357	357
125p	1000	553	551	464	464	464
126p	1000	508	493	340	340	340
140p	1000	456	441	427	427	427
141p	1000	473	460	381	381	381
142p	1000	583	567	495	495	495
155p	1000	528	524	445	445	445
156p	1000	530	525	425	425	425
157p	1000	513	507	438	438	438
165p	1000	521	509	442	442	442
166p	1000	519	510	478	478	478
167p	1000	572	563	556	556	556

1 / 32

R course Daniel Vaulot 2023-01-18 05 - Metabarcode processing with dada2

R course Daniel Vaulot...
Introduction
Downloads
Data used
Set-up
Load the necessary libraries
Set up directories
Setup variables
Examine the fastQ files
Construct a list of the fastq files
Compute number of paired reads
Plot quality for reads
Filter and Trim the reads
Two approaches
Method 1 - Removing the primers by sequence
Method 2 - Remove primers by truncation and filter
Dada2 processing
Learn error rates
Dereplicate the reads
Sequence-variant inference algorithm to the dereplicated data
Merge sequences
Make sequence table
Remove chimeras
Track number of reads at each step
Transforming and saving the ASVs sequences
Assigning taxonomy
Export data
Export
Filter for 18S
Write FASTA file for BLAST analysis with taxonomy
Write FASTA file for BLAST analysis with taxonomy
Phyloseq