How to Run Your First Nextflow Pipeline: A Beginner's Guide

A step-by-step guide to running your first Nextflow pipeline, from installation to your first successful run, with a real RNA-seq example using nf-core/rnaseq.

Nextflow can look intimidating from the outside with DSL2 syntax, channels, executors, and configs. But running an existing pipeline (especially an nf-core pipeline) is much simpler than writing one from scratch.

This guide walks you through running your first Nextflow pipeline from zero, using the nf-core/rnaseq pipeline as a practical example. By the end, you’ll have a working RNA-seq pipeline executing on your local machine or HPC cluster.

Prerequisites

Before starting, you need:

  • A Linux or macOS machine (Windows users: use WSL2)
  • Java 11 or higher installed (java -version to check)
  • Docker or Singularity installed (for containers)
  • ~10 GB of free disk space for the test run

If you don’t have Java: install via sudo apt install default-jdk (Ubuntu) or brew install openjdk (macOS with Homebrew).

Step 1: Install Nextflow

Nextflow installation is a single command:

curl -s https://get.nextflow.io | bash

This downloads the Nextflow executable to your current directory. Move it to somewhere on your PATH:

sudo mv nextflow /usr/local/bin/

Verify the installation:

nextflow -version

You should see output like nextflow version 24.10.0.XXXX.

Step 2: Test your installation

Run the built-in hello-world pipeline to confirm everything works:

nextflow run hello

If this completes successfully, Nextflow is installed and working. You should see output like:

N E X T F L O W  ~  version 24.10.0
Launching `https://github.com/nextflow-io/hello` [jovial_faraday] DSL2 - revision: ...
executor >  local (4)
[aa/bb1234] process > sayHello (1) [100%] 4 of 4 ✔
Hello world!
Bonjour world!
Ciao world!
Hola world!

Step 3: Run an nf-core pipeline

nf-core pipelines are hosted on GitHub and can be run directly from the command line. For RNA-seq:

nextflow run nf-core/rnaseq \
  -profile test,docker \
  --outdir results

The -profile test flag uses nf-core’s built-in small test dataset so you don’t need to provide your own data. The docker profile tells Nextflow to use Docker containers; no manual software installation is needed.

This will:

  1. Pull the pipeline from GitHub (first run only)
  2. Download all required Docker containers
  3. Run a complete RNA-seq alignment and quantification workflow
  4. Write results to ./results/

On first run, expect this to take 15–30 minutes depending on your download speed (mostly pulling containers).

Step 4: Understanding the output

After a successful run, your results/ directory will contain:

results/
├── fastqc/          # Raw read quality metrics
├── trimgalore/      # Trimmed reads and quality reports
├── star_salmon/     # Aligned reads and transcript quantification
│   └── salmon.merged.gene_counts.tsv  # Gene count matrix
├── multiqc/         # Aggregated QC report (open this first)
└── pipeline_info/   # Execution report, timeline, software versions

Start with results/multiqc/multiqc_report.html — this gives you an overview of read quality, alignment rates, and quantification metrics for all samples.

Step 5: Run on your own data

To run on your own FASTQ files, you need to provide a samplesheet. Create a CSV file (samplesheet.csv):

sample,fastq_1,fastq_2,strandedness
sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,auto
sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,auto

Then run:

nextflow run nf-core/rnaseq \
  -profile docker \
  --input samplesheet.csv \
  --outdir results \
  --genome GRCh38

For mouse data, use --genome GRCm38. The pipeline will download the reference genome automatically on first use (this takes a while — ~5 GB).

Common issues and fixes

“Nextflow is not in PATH” Run which nextflow — if empty, the binary isn’t in a directory on your PATH. Move it to /usr/local/bin/ as shown above.

“Docker daemon not running” Run sudo systemctl start docker (Linux) or open the Docker Desktop app (macOS).

Pipeline fails with memory errors Add --max_memory 8.GB --max_cpus 4 to limit resource usage on a shared machine.

“WARN: Singularity cache directory has not been defined” Set export NXF_SINGULARITY_CACHEDIR=/path/to/cache in your .bashrc. This prevents re-downloading containers on every run.

Running on an HPC cluster

If you’re on a SLURM cluster, use -profile singularity instead of -profile docker (most HPCs don’t allow Docker), and add a SLURM executor config:

nextflow run nf-core/rnaseq \
  -profile singularity \
  --input samplesheet.csv \
  --outdir results \
  --genome GRCh38 \
  -c slurm.config

Where slurm.config contains:

process {
    executor = 'slurm'
    queue = 'your_queue_name'
}

Check nf-core’s institutional configs — your institution may already have a config file ready to use.

What to do next

Once you’ve run a test pipeline successfully:

  1. Read the nf-core/rnaseq documentation — it’s excellent and covers all parameters in detail
  2. Browse other nf-core pipelines for your analysis type at nf-co.re
  3. When you’re ready to write your own pipelines, work through the Nextflow training materials

The biggest productivity gain in bioinformatics is running a production-quality, community-maintained pipeline instead of building and maintaining your own from scratch. Start there.

For a broader grounding in the Unix, Python, and R skills that underpin everything Nextflow pipelines produce, Bioinformatics Data Skills by Vince Buffalo covers reproducible workflows, file formats, and the command-line environment that makes Nextflow feel intuitive once you understand the foundations.