Skip to content

Advanced Nextflow Channel Operations Exercises

This comprehensive tutorial covers advanced Nextflow channel operations through 8 progressive exercises. Each exercise focuses on specific operators and techniques for sophisticated data processing workflows.

Prerequisites

In this folder you will find:

  • Input files needed for your exercises
  • metadata.csv
  • nextflow.config
  • Nextflow files for each exercise
  • In the hint file you wll find an "initialised" file of your task if you don't know where to start
  • In the solution you will find a possible complete solution that will run successfully

To run your code, use the following command:

nextflow run /path/to/your/main.nf -c /path/to/yout/nextflow.config

Hint: if you place the file nextflow.config in the directory where you will run nextflow, you can omit the -c parameter because if a file named nextflow.config is present in the execution directory it will automatically loaded and parsed.

Alternative: you can avoid parsing the config file by providing the parameters directly via the command line in this way:

nextflow run /path/to/your/main.nf --input_csv /path/to/input.csv --input_tsv /path/to/input.tsv --outdir output/

Other alternative: you can use a params file to pass parameters to your workflow:

nextflow run /path/to/your/main.nf -params /path/to/params.yaml

These methods are equivalent, choose the one you are more comfortable with!

Recap table of operators

Category Operator Description Examples from Code
Channel Creation fromPath Creates a channel from file paths channel.fromPath(file(params.input_csv, checkIfExists: true))
splitCsv Splits CSV files into records .splitCsv(header)
Transformation map Transforms channel elements .map{row -> tuple(...)}, .map{meta, greeting -> tuple(...)}
transpose Transposes grouped elements all_languages_grouped.transpose()
multiMap Creates multiple output channels from one input .multiMap{meta, greeting -> only_family: ..., only_sub_category: ...}
Filtering & Selection branch Splits channel into multiple branches based on conditions .branch{meta, greeting -> neo_latin: ..., germanic: ..., other: ...}
filter Filters elements based on conditions .filter{meta, _file -> meta.family == "germanic"}
unique Removes duplicate elements .unique()
Grouping & Aggregation groupTuple Groups elements by key .groupTuple()
flatten Flattens nested structures .flatten()
collect Collects all elements into a single emission .collect()
collectFile Collects elements into a file .collectFile{meta, file_names -> ...}
Channel Combination mix Mixes multiple channels (unordered) neo_latin_greetings_grouped.mix(greetings_by_family.germanic)
concat Concatenates channels (ordered) neo_latin_greetings_grouped.concat(greetings_by_family.germanic)
join Joins channels by key .join(greetings_by_family.germanic), .join(..., remainder: true)
combine Combines channels (cartesian product) .combine(greetings_by_family.germanic, by: 0)

Exercise 1: Channel Branching

Learning Objectives

  • Understand conditional channel routing with the branch operator
  • Learn how to split channels based on criteria
  • Practice creating multiple output channels from a single source

Theoretical Background

The branch operator forwards each item from a source channel to one of multiple output channels based on selection criteria. Each branch is defined by a unique label followed by a boolean expression. When an item is received, it's routed to the first output channel whose expression evaluates to true.

Key concepts:

  • Conditional routing: Items are directed to different channels based on conditions
  • Multiple outputs: One input channel becomes multiple named output channels
  • Fallback conditions: Use true as the last condition to catch unmatched items
  • Custom return values: Transform items as they're routed to branches

Your Task

You will implement channel branching to separate languages by family:

  • Create a neo_latin branch for Romance languages
  • Create a germanic branch for Germanic languages
  • Add an other fallback branch for unmatched languages
  • Transform the output to include family metadata

Expected Learning Outcomes

After completing this exercise, you should understand:

  • How to define branch conditions using boolean expressions
  • The importance of branch order (first match wins)
  • How to transform data while branching
  • When and why to use fallback conditions

Exercise 2: Channel Combination and Concatenation

Learning Objectives

  • Master the mix and concat operators for combining channels
  • Understand the differences between mixing and concatenating
  • Learn groupTuple for collecting related items
  • Practice transpose for flattening nested structures

Theoretical Background

Channel combination operators allow you to merge data from multiple sources:

  • mix: Combines items from multiple channels in any order (asynchronous)
  • concat: Combines channels sequentially (first channel completes before second starts)
  • groupTuple: Collects tuples with matching keys into groups
  • transpose: Flattens nested lists in tuples

Your Task

You will work with grouped data and transformations:

  • Group greeting files by language family using groupTuple
  • Use transpose to flatten grouped collections
  • Compare mix vs concat behavior with the same data
  • Transform grouped data for further processing

Expected Learning Outcomes

After completing this exercise, you should understand:

  • When to use mix vs concat for combining channels
  • How groupTuple collects items by matching keys
  • The flattening behavior of transpose
  • Performance implications of different combination strategies

Exercise 3: Filtering and Collection Operations

Learning Objectives

  • Master filtering techniques with the filter operator
  • Learn collection strategies: flatten, collect, and collectFile
  • Understand when to use each collection method
  • Practice file-based data aggregation

Theoretical Background

Data filtering and collection are essential for data processing pipelines:

  • filter: Selects items that match specific criteria
  • flatten: Converts nested collections into individual items
  • collect: Gathers all items into a single list
  • collectFile: Aggregates items into files

Your Task

You will implement various collection strategies:

  • Filter channels to select specific language families
  • Use flatten to separate grouped items into individual emissions
  • Apply collect to gather items into lists
  • Use collectFile to aggregate data into files with custom formatting

Expected Learning Outcomes

After completing this exercise, you should understand:

  • Different filtering criteria (boolean predicates, regular expressions, type qualifiers)
  • When to flatten vs collect data structures
  • File-based aggregation strategies

Exercise 4: Advanced Joining Operations

Learning Objectives

  • Master channel joining with combine, join, and advanced matching
  • Understand inner vs outer join semantics
  • Learn key-based data merging strategies
  • Practice handling duplicate and missing keys

Theoretical Background

Joining operations merge data from multiple channels based on matching keys:

  • combine: Creates cross-product combinations, optionally filtered by key
  • join: Performs inner joins (SQL-like) with matching keys
  • Key matching: Uses tuple positions to match related data
  • Remainder option: Controls what happens with unmatched items

Your Task

You will implement different joining strategies:

  • Create a reference metadata channel for joining
  • Perform inner joins to merge matching data
  • Use outer joins to preserve unmatched items
  • Apply combine for cross-product operations
  • Standardize and deduplicate joined results using unique

Expected Learning Outcomes

After completing this exercise, you should understand:

  • The difference between inner and outer joins
  • When to use combine vs join for data merging
  • Key-based matching strategies
  • Handling of duplicate and missing keys in joins

Exercise 5: Advanced Mapping and Regular Expressions

Learning Objectives

  • Master advanced map transformations
  • Learn regular expression pattern matching in Nextflow
  • Practice metadata extraction from filenames
  • Understand the multiMap operator for creating multiple output channels

Theoretical Background

Advanced mapping operations enable sophisticated data transformations:

  • Pattern matching: Use regular expressions to extract information
  • multiMap: Creates multiple output channels from a single input
  • Metadata extraction: Parse structured information from strings
  • Complex transformations: Combine multiple operations in mapping functions

Your Task

You will implement advanced mapping techniques:

  • Use regular expressions to extract metadata from file paths
  • Create multiple output channels using multiMap
  • Transform data structures for different downstream processes
  • Practice complex data parsing and restructuring

Expected Learning Outcomes

After completing this exercise, you should understand:

  • Regular expression syntax in Groovy/Nextflow
  • How to extract structured data from strings
  • Creating multiple channels from single inputs
  • Advanced data transformation patterns

Key Learning Points