Advanced Nextflow Channel Operations Exercises
This comprehensive tutorial covers advanced Nextflow channel operations through 8 progressive exercises. Each exercise focuses on specific operators and techniques for sophisticated data processing workflows.
Prerequisites
In this folder you will find:
- Input files needed for your exercises
- metadata.csv
- nextflow.config
- Nextflow files for each exercise
- In the hint file you wll find an "initialised" file of your task if you don't know where to start
- In the solution you will find a possible complete solution that will run successfully
To run your code, use the following command:
nextflow run /path/to/your/main.nf -c /path/to/yout/nextflow.config
Hint: if you place the file nextflow.config in the directory where you will run nextflow, you can omit the -c parameter because if a file named nextflow.config is present in the execution directory it will automatically loaded and parsed.
Alternative: you can avoid parsing the config file by providing the parameters directly via the command line in this way:
nextflow run /path/to/your/main.nf --input_csv /path/to/input.csv --input_tsv /path/to/input.tsv --outdir output/
Other alternative: you can use a params file to pass parameters to your workflow:
nextflow run /path/to/your/main.nf -params /path/to/params.yaml
These methods are equivalent, choose the one you are more comfortable with!
Recap table of operators
| Category | Operator | Description | Examples from Code |
|---|---|---|---|
| Channel Creation | fromPath |
Creates a channel from file paths | channel.fromPath(file(params.input_csv, checkIfExists: true)) |
splitCsv |
Splits CSV files into records | .splitCsv(header) |
|
| Transformation | map |
Transforms channel elements | .map{row -> tuple(...)}, .map{meta, greeting -> tuple(...)} |
transpose |
Transposes grouped elements | all_languages_grouped.transpose() |
|
multiMap |
Creates multiple output channels from one input | .multiMap{meta, greeting -> only_family: ..., only_sub_category: ...} |
|
| Filtering & Selection | branch |
Splits channel into multiple branches based on conditions | .branch{meta, greeting -> neo_latin: ..., germanic: ..., other: ...} |
filter |
Filters elements based on conditions | .filter{meta, _file -> meta.family == "germanic"} |
|
unique |
Removes duplicate elements | .unique() |
|
| Grouping & Aggregation | groupTuple |
Groups elements by key | .groupTuple() |
flatten |
Flattens nested structures | .flatten() |
|
collect |
Collects all elements into a single emission | .collect() |
|
collectFile |
Collects elements into a file | .collectFile{meta, file_names -> ...} |
|
| Channel Combination | mix |
Mixes multiple channels (unordered) | neo_latin_greetings_grouped.mix(greetings_by_family.germanic) |
concat |
Concatenates channels (ordered) | neo_latin_greetings_grouped.concat(greetings_by_family.germanic) |
|
join |
Joins channels by key | .join(greetings_by_family.germanic), .join(..., remainder: true) |
|
combine |
Combines channels (cartesian product) | .combine(greetings_by_family.germanic, by: 0) |
Exercise 1: Channel Branching
Learning Objectives
- Understand conditional channel routing with the
branchoperator - Learn how to split channels based on criteria
- Practice creating multiple output channels from a single source
Theoretical Background
The branch operator forwards each item from a source channel to one of multiple output channels based on selection criteria. Each branch is defined by a unique label followed by a boolean expression. When an item is received, it's routed to the first output channel whose expression evaluates to true.
Key concepts:
- Conditional routing: Items are directed to different channels based on conditions
- Multiple outputs: One input channel becomes multiple named output channels
- Fallback conditions: Use
trueas the last condition to catch unmatched items - Custom return values: Transform items as they're routed to branches
Your Task
You will implement channel branching to separate languages by family:
- Create a
neo_latinbranch for Romance languages - Create a
germanicbranch for Germanic languages - Add an
otherfallback branch for unmatched languages - Transform the output to include family metadata
Expected Learning Outcomes
After completing this exercise, you should understand:
- How to define branch conditions using boolean expressions
- The importance of branch order (first match wins)
- How to transform data while branching
- When and why to use fallback conditions
Exercise 2: Channel Combination and Concatenation
Learning Objectives
- Master the
mixandconcatoperators for combining channels - Understand the differences between mixing and concatenating
- Learn
groupTuplefor collecting related items - Practice
transposefor flattening nested structures
Theoretical Background
Channel combination operators allow you to merge data from multiple sources:
- mix: Combines items from multiple channels in any order (asynchronous)
- concat: Combines channels sequentially (first channel completes before second starts)
- groupTuple: Collects tuples with matching keys into groups
- transpose: Flattens nested lists in tuples
Your Task
You will work with grouped data and transformations:
- Group greeting files by language family using
groupTuple - Use
transposeto flatten grouped collections - Compare
mixvsconcatbehavior with the same data - Transform grouped data for further processing
Expected Learning Outcomes
After completing this exercise, you should understand:
- When to use
mixvsconcatfor combining channels - How
groupTuplecollects items by matching keys - The flattening behavior of
transpose - Performance implications of different combination strategies
Exercise 3: Filtering and Collection Operations
Learning Objectives
- Master filtering techniques with the
filteroperator - Learn collection strategies:
flatten,collect, andcollectFile - Understand when to use each collection method
- Practice file-based data aggregation
Theoretical Background
Data filtering and collection are essential for data processing pipelines:
- filter: Selects items that match specific criteria
- flatten: Converts nested collections into individual items
- collect: Gathers all items into a single list
- collectFile: Aggregates items into files
Your Task
You will implement various collection strategies:
- Filter channels to select specific language families
- Use
flattento separate grouped items into individual emissions - Apply
collectto gather items into lists - Use
collectFileto aggregate data into files with custom formatting
Expected Learning Outcomes
After completing this exercise, you should understand:
- Different filtering criteria (boolean predicates, regular expressions, type qualifiers)
- When to flatten vs collect data structures
- File-based aggregation strategies
Exercise 4: Advanced Joining Operations
Learning Objectives
- Master channel joining with
combine,join, and advanced matching - Understand inner vs outer join semantics
- Learn key-based data merging strategies
- Practice handling duplicate and missing keys
Theoretical Background
Joining operations merge data from multiple channels based on matching keys:
- combine: Creates cross-product combinations, optionally filtered by key
- join: Performs inner joins (SQL-like) with matching keys
- Key matching: Uses tuple positions to match related data
- Remainder option: Controls what happens with unmatched items
Your Task
You will implement different joining strategies:
- Create a reference metadata channel for joining
- Perform inner joins to merge matching data
- Use outer joins to preserve unmatched items
- Apply
combinefor cross-product operations - Standardize and deduplicate joined results using
unique
Expected Learning Outcomes
After completing this exercise, you should understand:
- The difference between inner and outer joins
- When to use
combinevsjoinfor data merging - Key-based matching strategies
- Handling of duplicate and missing keys in joins
Exercise 5: Advanced Mapping and Regular Expressions
Learning Objectives
- Master advanced
maptransformations - Learn regular expression pattern matching in Nextflow
- Practice metadata extraction from filenames
- Understand the
multiMapoperator for creating multiple output channels
Theoretical Background
Advanced mapping operations enable sophisticated data transformations:
- Pattern matching: Use regular expressions to extract information
- multiMap: Creates multiple output channels from a single input
- Metadata extraction: Parse structured information from strings
- Complex transformations: Combine multiple operations in mapping functions
Your Task
You will implement advanced mapping techniques:
- Use regular expressions to extract metadata from file paths
- Create multiple output channels using
multiMap - Transform data structures for different downstream processes
- Practice complex data parsing and restructuring
Expected Learning Outcomes
After completing this exercise, you should understand:
- Regular expression syntax in Groovy/Nextflow
- How to extract structured data from strings
- Creating multiple channels from single inputs
- Advanced data transformation patterns