Nextflow Channel Operations and Process Workflow Exercises
This tutorial contains three progressive exercises designed to teach fundamental Nextflow concepts through hands-on practice.
Prerequisites
In this folder you will find:
- Input files needed for your exercises
- metadata.csv
- metadata.tsv
- Files with needed parameters
- nextflow.config
- params.yaml
- Nextflow files for each exercise
- In the hint file you wll find an "initialised" file of your task if you don't know where to start
- In the solution you will find a possible complete solution that will run successfully
To run your code, use the following command:
nextflow run /path/to/your/main.nf -c /path/to/yout/nextflow.config
Hint: if you place the file nextflow.config in the directory where you will run nextflow, you can omit the -c parameter because if a file named nextflow.config is present in the execution directory it will automatically loaded and parsed.
Alternative: you can avoid parsing the config file by providing the parameters directly via the command line in this way:
nextflow run /path/to/your/main.nf --input_csv /path/to/input.csv --input_tsv /path/to/input.tsv --outdir output/
Other alternative: you can use a params file to pass parameters to your workflow:
nextflow run /path/to/your/main.nf -params /path/to/params.yaml
These methods are equivalent, choose the one you are more comfortable with!
Exercise 1: Understanding and Creating Input Channels
Learning Objectives
- Understand what Nextflow channels are and their role in data flow
- Learn different channel factory methods
- Practice creating channels from various data sources
- Understand the difference between value channels and queue channels
Theoretical Background
Channels are the fundamental data structure in Nextflow that connect processes and enable data flow through your pipeline. Think of channels as pipes that carry data between different parts of your workflow. There are two main types:
- Queue channels: Can be consumed only once and are automatically closed when empty
- Value channels: Can be read multiple times and never close
Learn more about channels in the official documentation.
Channel Factory Methods You'll Use
channel.of(): Creates a channel from explicit valueschannel.fromPath(): Creates a channel from file paths.splitCsv(): Operator that parses CSV/TSV files into records
For a complete list of channel factories, see the Channel factories reference.
Your Task
You will create three different input channels:
- A basic channel containing a hardcoded tuple with language and greeting
- A channel that reads from a CSV file and parses it with headers
- A channel that reads from a TSV file and parses it with tab separation
Expected Learning Outcomes
After completing this exercise, you should understand:
- How to create channels using different factory methods
- The structure of tuple data in channels
- How file parsing works with
.splitCsv() - How to verify channel contents using
.view()
Exercise 2: Channel Operators and Data Transformation
Learning Objectives
- Master essential channel operators for data manipulation
- Understand how to combine multiple channels
- Learn data transformation techniques using
.map() - Practice creating complex data processing pipelines
Theoretical Background
Channel operators are methods that transform, filter, combine, or manipulate the data flowing through channels. They are the building blocks for creating sophisticated data processing workflows. Key concepts:
- Combining channels: Use operators like
.mix()to merge multiple channels - Data transformation: Use
.map()to transform each item in a channel - Chaining operations: Operators can be chained together for complex transformations
For comprehensive information about all available operators, see the Operators reference.
Channel Operators You'll Use
.mix(): Combines items from multiple channels into a single channel.map(): Transforms each item in a channel using a closure- Closure syntax:
{ item -> transformation }or{ param1, param2 -> transformation }
Your Task
You will create a data processing pipeline that:
- Combines data from CSV and TSV channels using
.mix() - Transforms file records into standardized tuple format using
.map() - Merges processed file data with the basic channel
- Applies final formatting to create personalized greeting messages
Expected Learning Outcomes
After completing this exercise, you should understand:
- How to combine multiple data sources into unified workflows
- The power of
.map()for data transformation - How to work with row objects from CSV parsing
- String interpolation in Groovy/Nextflow
- Method chaining for building complex data pipelines
Exercise 3: Process Definition and Workflow Integration
Learning Objectives
- Understand Nextflow process structure and components
- Learn how to define process inputs and outputs
- Master the connection between channels and processes
- Practice creating complete end-to-end workflows
Theoretical Background
Processes are the computational units in Nextflow that execute tasks. Each process defines:
- Input declaration: Specifies what data the process expects
- Output declaration: Defines what the process produces
- Script block: Contains the actual commands to execute
- Directives: Optional settings like
publishDirfor result management
For detailed information about processes, see the Processes documentation and Process reference.
Process Components You'll Work With
- Input types:
val()for values,path()for files,tuple()for combinations - Output types:
path()for files,tuple()for structured data - emit: Named outputs that can be referenced in workflows
- publishDir: Directive to copy results to specified directories
Your Task
- Define a
sayHelloprocess that writes messages to files - Define a
saveHelloprocess that processes and publishes final results - Connect your channel operations from Exercise 2 to these processes
- Create a complete workflow that transforms input data through both processes
Expected Learning Outcomes
After completing this exercise, you should understand:
- The anatomy of a Nextflow process
- How to properly declare inputs and outputs
- The relationship between channel structure and process inputs
- How processes communicate through their outputs
- Result publishing and file management
- Complete workflow orchestration from data input to final output
Workflow Execution Flow
Your final workflow will follow this pattern:
Input Channels → Channel Operations → sayHello Process → saveHello Process → Published Results
This represents a complete data processing pipeline where raw input is transformed, processed, and published as final results.