Nextflow can look intimidating from the outside with DSL2 syntax, channels, executors, and configs. But running an existing pipeline (especially an nf-core pipeline) is much simpler than writing one from scratch.
This guide walks you through running your first Nextflow pipeline from zero, using the nf-core/rnaseq pipeline as a practical example. By the end, you’ll have a working RNA-seq pipeline executing on your local machine or HPC cluster.
Prerequisites
Before starting, you need:
- A Linux or macOS machine (Windows users: use WSL2)
- Java 11 or higher installed (
java -versionto check) - Docker or Singularity installed (for containers)
- ~10 GB of free disk space for the test run
If you don’t have Java: install via sudo apt install default-jdk (Ubuntu) or brew install openjdk (macOS with Homebrew).
Step 1: Install Nextflow
Nextflow installation is a single command:
curl -s https://get.nextflow.io | bash
This downloads the Nextflow executable to your current directory. Move it to somewhere on your PATH:
sudo mv nextflow /usr/local/bin/
Verify the installation:
nextflow -version
You should see output like nextflow version 24.10.0.XXXX.
Step 2: Test your installation
Run the built-in hello-world pipeline to confirm everything works:
nextflow run hello
If this completes successfully, Nextflow is installed and working. You should see output like:
N E X T F L O W ~ version 24.10.0
Launching `https://github.com/nextflow-io/hello` [jovial_faraday] DSL2 - revision: ...
executor > local (4)
[aa/bb1234] process > sayHello (1) [100%] 4 of 4 ✔
Hello world!
Bonjour world!
Ciao world!
Hola world!
Step 3: Run an nf-core pipeline
nf-core pipelines are hosted on GitHub and can be run directly from the command line. For RNA-seq:
nextflow run nf-core/rnaseq \
-profile test,docker \
--outdir results
The -profile test flag uses nf-core’s built-in small test dataset so you don’t need to provide your own data. The docker profile tells Nextflow to use Docker containers; no manual software installation is needed.
This will:
- Pull the pipeline from GitHub (first run only)
- Download all required Docker containers
- Run a complete RNA-seq alignment and quantification workflow
- Write results to
./results/
On first run, expect this to take 15–30 minutes depending on your download speed (mostly pulling containers).
Step 4: Understanding the output
After a successful run, your results/ directory will contain:
results/
├── fastqc/ # Raw read quality metrics
├── trimgalore/ # Trimmed reads and quality reports
├── star_salmon/ # Aligned reads and transcript quantification
│ └── salmon.merged.gene_counts.tsv # Gene count matrix
├── multiqc/ # Aggregated QC report (open this first)
└── pipeline_info/ # Execution report, timeline, software versions
Start with results/multiqc/multiqc_report.html — this gives you an overview of read quality, alignment rates, and quantification metrics for all samples.
Step 5: Run on your own data
To run on your own FASTQ files, you need to provide a samplesheet. Create a CSV file (samplesheet.csv):
sample,fastq_1,fastq_2,strandedness
sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,auto
sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,auto
Then run:
nextflow run nf-core/rnaseq \
-profile docker \
--input samplesheet.csv \
--outdir results \
--genome GRCh38
For mouse data, use --genome GRCm38. The pipeline will download the reference genome automatically on first use (this takes a while — ~5 GB).
Common issues and fixes
“Nextflow is not in PATH”
Run which nextflow — if empty, the binary isn’t in a directory on your PATH. Move it to /usr/local/bin/ as shown above.
“Docker daemon not running”
Run sudo systemctl start docker (Linux) or open the Docker Desktop app (macOS).
Pipeline fails with memory errors
Add --max_memory 8.GB --max_cpus 4 to limit resource usage on a shared machine.
“WARN: Singularity cache directory has not been defined”
Set export NXF_SINGULARITY_CACHEDIR=/path/to/cache in your .bashrc. This prevents re-downloading containers on every run.
Running on an HPC cluster
If you’re on a SLURM cluster, use -profile singularity instead of -profile docker (most HPCs don’t allow Docker), and add a SLURM executor config:
nextflow run nf-core/rnaseq \
-profile singularity \
--input samplesheet.csv \
--outdir results \
--genome GRCh38 \
-c slurm.config
Where slurm.config contains:
process {
executor = 'slurm'
queue = 'your_queue_name'
}
Check nf-core’s institutional configs — your institution may already have a config file ready to use.
What to do next
Once you’ve run a test pipeline successfully:
- Read the nf-core/rnaseq documentation — it’s excellent and covers all parameters in detail
- Browse other nf-core pipelines for your analysis type at nf-co.re
- When you’re ready to write your own pipelines, work through the Nextflow training materials
The biggest productivity gain in bioinformatics is running a production-quality, community-maintained pipeline instead of building and maintaining your own from scratch. Start there.
For a broader grounding in the Unix, Python, and R skills that underpin everything Nextflow pipelines produce, Bioinformatics Data Skills by Vince Buffalo covers reproducible workflows, file formats, and the command-line environment that makes Nextflow feel intuitive once you understand the foundations.