Nextflow Course Workflow

This document outlines the steps to create a Nextflow workflow with one main workflow and three modules:

Part 1 Workflow

Create a main workflow that includes one module called FASTP.
Define a channel to store FASTQ information from the file specified in params.input_file. This channel should include the part column from the TSV, which acts as a row counter / FASTQ set ID. The final channel, when using .view(), should look like this:

groovy [sample_1, 1, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R1.part_001.fastq, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R2.part_001.fastq] [sample_1, 2, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R1.part_002.fastq, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R2.part_002.fastq] [sample_1, 3, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R1.part_003.fastq, /project/nextflow_zero2hero/data/NA12878/fastq/sample1/chunks/reads_R2.part_003.fastq] [sample_2, 1, /project/nextflow_zero2hero/data/NA12878/fastq/sample2/chunks/reads_R1.part_003.fastq, /project/nextflow_zero2hero/data/NA12878/fastq/sample2/chunks/reads_R2.part_003.fastq]

This channel will serve as the sole input for the FASTP module.

Create a module called FASTP that:
Takes as input a tuple with the structure shown above.
Outputs two tuples: - First tuple: Contains sample_id, ${sample_id}_${fastq_R1_basename}-${fastq_set_id}-qced.fastq.gz, ${sample_id}_${fastq_R2_basename}-${fastq_set_id}-qced.fastq.gz. - Second tuple: Contains log and HTML files (not used for now).
Use the following script to process the input and generate the output:

bash fastp \ -i ${fastq_R1} -o ${sample_id}_${fastq_set_id}_R1_qced.fastq.gz \ -I ${fastq_R2} -O ${sample_id}_${fastq_set_id}_R2_qced.fastq.gz \ --json ${sample_id}_${fastq_set_id}_fastp.json \ --html ${sample_id}_${fastq_set_id}_fastp.html \ --thread 4
Outputs the following tuples:
- tuple val(sample_id), val(fastq_set_id), "sample_id_fastq_set_id_R1_qced.fastq.gz", "sample_id_fastq_set_id_R2_qced.fastq.gz"
- tuple val(sample_id), val(fastq_set_id), ".json", ".html"

The results should be organized into two folders with the following structure:

You should have as results 2 folders with the following structure:

Expand the Main Workflow:
Set the object reference_genome from params.reference_genome.
Create a channel called bwa_index_ch that contains tuples with the five BWA index files for reference_genome: genome.fa.amb, genome.fa.ann, genome.fa.bwt, genome.fa.pac, and genome.fa.sa. All these indexes are in the same path as the params.reference_genome.
Update the FASTP Module:
Add an emit to output the first tuple of the FASTP module as qced_reads.

Create a module called BWA_MEM that:
Takes as input the reference_genome file and the qced_reads and bwa_input_ch channel:
Outputs two tuples:
- val(sample_id), path("${sample_id}-${fastq_set_id}.bwa.bam")
- val(sample_id), path("${sample_id}-${fastq_set_id}.bwa.log")
Use the following script to process the input:

bash bwa mem -t 4 \ -R "@RG\tID:${sample_id}\tSM:${sample_id}\tPL:Illumina" \ ${reference_genome} \ ${fastq_R1} ${fastq_R2} \ 2> ${sample_id}-${fastq_set_id}.bwa.log \ | samtools view --threads 4 -Sb - > ${sample_id}-${fastq_set_id}.bwa.bam
Output the files of bwa in the folders
- results/alignments/sample_1/bwa/
- results/alignments/sample_2/bwa/

Create a module called SAMTOOLS_MERGE that:
- Takes as input the output from the BWA_MEM module:
groovy val(sample_id), path("${sample_id}-${fastq_set_id}.bwa.bam")
- Outputs a tuple:
groovy val(sample_id), file("${sample_id}.merged_raw.bam")
Use the following script to merge BAM files:

bash samtools merge -n -@ ${task.cpus} -o ${sample_id}.merged_raw.bam ${bam_files}
The output of the program needs to go in the folders
- results/alignments/sample_1/merged_bam/
- results/alignments/sample_2/merged_bam/

The results should be organized into the following structure:

results/alignments/sample_1/merged_bam/sample_1.merged_raw.bam results/alignments/sample_2/merged_bam/sample_2.merged_raw.bam