Day 3 - section 4 - Groovy scripting inside Nextflow processes
This practical focuses on Groovy scripting inside a single Nextflow process.
You will start from a minimal working pipeline and progressively extend the same process, learning how Groovy is evaluated inside script: blocks and how it differs from Bash execution at runtime.
Two phases, two languages
A Nextflow process is evaluated in two distinct phases.
Phase 1 — Groovy (pipeline construction)
Before any task is executed, Nextflow evaluates the process definition using Groovy.
During this phase:
script:blocks are parsed- Groovy variables are resolved
- Closures are executed
- File object properties are accessed
task.cpusandtask.memoryare known
Anything written as:
${variable}
inside a triple-quoted string is expanded by Groovy before execution.
By the time the task runs, Groovy variables no longer exist only their values remain.
Phase 2 — Bash (task runtime)
After Groovy evaluation, Nextflow launches the task using the system shell (usually Bash).
During this phase:
- The command is executed line by line
- Shell variables are expanded
- Environment variables become available
- Files are created and modified
Variables such as:
$HOME
$PATH
$PWD
are expanded by Bash at runtime, not by Groovy.
To prevent Groovy from expanding a Bash variable, it must be escaped:
\$HOME
Introduction to Ternary Operators
The ternary operator is a concise way to write simple conditional expressions in Groovy, the language used by Nextflow. It allows you to select between two values depending on a condition, all in a single line.
Syntax
condition ? valueIfTrue : valueIfFalse
condition — a Boolean expression that is evaluated.
valueIfTrue — the result returned if the condition is true.
valueIfFalse — the result returned if the condition is false.
This is equivalent to a simple if/else statement but written in a compact and readable form.
Example (generic)
def result = condition ? "Yes" : "No"
If condition is true, result will be "Yes".
If condition is false, result will be "No".
Why ternary operators are used in Nextflow
-
Dynamic value assignment — compute values based on conditions during pipeline construction (Groovy evaluation).
-
Simplify commands and options — generate flags, parameters, or metadata without verbose if/else blocks.
-
Concise and readable — keeps pipelines maintainable, especially when multiple conditional values are needed.
Exercise 1 — Sample names and variable scope
Modify the main.nf so that:
- Each input file produces a distinct output file
- The output filename is derived safely from the input
- Variables inside script: are declared correctly using Groovy
The initial pipeline always writes:
result.txt
Because the process runs once per input file, outputs overwrite each other.
Step 1 — Introduce a sample-specific variable (Groovy)
Choosing the right file property
Given the input file:
sample1_R1.fastq.gz
````
Nextflow exposes several file properties that behave differently.
* **`read.name`** — keeps all extensions
*(e.g.)*
```text
sample1_R1.fastq.gz
```
* **`read.baseName`** — removes only the **last** extension
*(e.g.)*
```text
sample1_R1.fastq
```
* **`read.simpleName`** — removes **all** extensions
*(e.g.)*
```text
sample1_R1
```
Because output filenames should usually be **extension-free and stable**,
`read.simpleName` is the safest default when deriving sample names for outputs.
---
Inside the script: block, define a variable derived from the input file:
```bash
script:
def sample = read.simpleName
Step 2 — Use the variable in the command
Replace the command with:
echo "Processing file: ${read.name}" > ${sample}.txt
Note that:
-
${sample} is expanded by Groovy
-
sample does not exist at runtime
Step 3 — Fix the output declaration
Because the filename is now dynamic, update the output: block:
output:
path "*.txt"
Expected solution
process PROCESS_READ {
publishDir "./results", mode: 'copy'
input:
path read
output:
path "*.txt"
script:
def sample = read.simpleName
"""
echo "Processing file: ${read.name}" > ${sample}.txt
"""
}
Questions?
-
Why must
samplebe declared withdef?
Declaringsamplewithdefmakes it a local Groovy variable, scoped only to thescript:block.
This prevents accidental overwriting of other variables and ensures predictable behavior. -
Why is
simpleNamepreferable tonameandbaseNamehere?
simpleNameremoves all file extensions, producing a clean, stable sample name.
namekeeps all extensions, which may lead to long filenames.
baseNameremoves only the last extension, which can leave residual extensions like.fastq.
For output files,simpleNameis usually safest. -
Why would
> $sample.txtnot work?
$sampleis a Bash variable, butsampleis a Groovy variable.
Bash has no knowledge of Groovy variables unless they are expanded first.
Using${sample}ensures the value is injected by Groovy before Bash executes. -
At which phase is
${sample}expanded?
${sample}is expanded during the Groovy phase (pipeline construction), before the task runs.
By the time Bash executes, the value is already inserted into the command.
Exercise 2 — Variable declaration inside the script: block
Practice
Modify the script: section and add:
def sample = read.simpleName
suffix = "_processed"
Then change the command to:
echo "Processing file: ${read.name}" > ${sample}${suffix}.txt
Questions
- What is the difference between def prefix and suffix?
def prefix declares a local Groovy variable that exists only inside the script: block.
suffix, declared without def, becomes a Groovy binding variable. Binding variables are placed in the global script binding instead of a local scope.
- Why does the pipeline still work?
The pipeline still works because Groovy automatically creates binding variables when an undeclared variable is assigned.
suffix is therefore resolved during Groovy evaluation, and its value is successfully injected into the command before execution.
However, this behavior is implicit and unsafe.
Exercise 3 — Conditional behavior from filenames #1
Inside the script: block, define a variable readType using a closure:
def readType =
simpleName.endsWith('_R1') ? 'forward' :
simpleName.endsWith('_R2') ? 'reverse' :
'single'
Then print it to the output:
echo "Read type: ${readType}" >> ${sample}.txt
What is happening
simpleName.endsWith('_R1') ? 'forward' : ...is a ternary expression.- It checks the filename suffix and returns
"forward","reverse", or"single". - This all happens during Groovy evaluation (pipeline construction), before Bash executes the command.
- The value of
readType(a string) is then injected into the Bash command.
Exercise 4 — Conditional behavior from filenames #2
Practice
Inside the script: block, define a variable flag based on the filename:
def flag = sample.contains('tumor') ? '--tumor' : '--normal'
Then print the generated command:
echo "Command: mytool ${flag} -i ${read.name}" >> ${sample}.txt
Test with these files:
sample_tumor.fastq.gz
sample_normal.fastq.gz
What is happening
sample.contains('tumor') ? '--tumor' : '--normal'is a ternary expression.- It checks whether the sample name includes
"tumor"and returns the appropriate flag. - This computation happens during Groovy evaluation, so the final Bash command already includes the resolved value.
Exercise 5 — Using task.cpus (Groovy metadata)
use all allocated CPUs for the task, but allow for a minimum of 1 thread.
- If
task.cpusis set to 2, then use 2 threads. - If
task.cpusis higher, use that many threads. - If
task.cpusis unset, default to 1.
Set the process header to specify available CPUs:
cpus 2
Inside the script: block, define the number of threads dynamically:
def threads = task.cpus ?: 1
Then use it in the command:
echo "Threads: ${threads}" >> ${sample}.txt
What is happening
task.cpusgives the number of CPUs allocated to this process.- The Bash command can now safely use
${threads}for multithreading tools.
Question
Why is this preferable to hard-coding thread counts?
- Using
task.cpusensures that the process adapts to the allocated resources. - Hard-coding thread counts can lead to over- or under-utilization of CPUs.
- This approach guarantees a dynamic, deterministic, and safe value that reflects the actual environment.
Exercise 6 — Using task.memory (Groovy metadata)
Set the process header to specify memory:
memory '8 GB'
Inside the script: block, define a memory option for your tool based on the allocated memory:
def memOpt = "-m ${task.memory.toMega()}"
Then print it to the output:
echo "Memory option: ${memOpt}" >> memory.txt
What is happening
task.memorygives the memory allocated to the process.task.memory.toMega()converts the value to megabytes, which many tools require for command-line options.- The closure
{ "-m ${task.memory.toMega()}" }constructs the tool-specific memory string. - The Bash command can now safely use
${memOpt}when running memory-aware tools.
Question
Why is this preferable to hard-coding memory values?
- Using
task.memoryensures the process adapts to the allocated resources. - Hard-coded memory options can exceed available memory or underutilize resources.
- Computing the option in Groovy guarantees a deterministic and safe value that reflects the environment.
Exercise 7 — Groovy vs Bash expansion
Inside the script: block, print a Groovy variable and a Bash environment variable:
echo "Home: \$HOME" > bash_variables.txt
echo "Path: \$PATH" >> bash_variables.txt
What is happening
-
${sample},${readType},${flag},${threads},${memOpt}are all Groovy variables. -
Their values are computed during Groovy evaluation (pipeline construction) and injected into the Bash commands.
-
$HOMEis a Bash environment variable. -
It is expanded at task runtime.
- Saving it in a separate file (
bash_variables.txt) makes the distinction explicit. - Takeaway rule: Always ask:
“Is this evaluated by Groovy, or by Bash?”
Confusing the two is the source of most bugs in advanced Nextflow scripting.
Examples of expansion
| Variable | Expanded by | When |
|---|---|---|
${sample} |
Groovy | Before execution |
${threads} |
Groovy | Before execution |
${readType} |
Groovy | Before execution |
$HOME |
Bash | Runtime |
$PATH |
Bash | Runtime |
Key point
- Groovy phase: variables (
def var, closures) and${var}are computed before Bash runs. - Bash phase: shell variables (
$VAR) and commands are executed at runtime.
By separating the Groovy and Bash outputs, you can see clearly which variables are evaluated when, and avoid the most common Nextflow scripting errors.