Docker and Singularity for Bioinformatics: A Practical Guide

The Problem

You finalized an RNA-seq analysis on your laptop. The pipeline runs flawlessly: FastQC, Trim Galore, STAR, and DESeq2 all execute in the right order with the output you expect. You email the code to a collaborator or upload it to a cluster, and their response is immediate: “It doesn’t work on my machine.”

The reasons are familiar: they’re running a different operating system. The versions of FastQC or STAR they installed differ from yours by a minor point release, which changed the output format. Python dependencies conflict. Conda solved the environment portability problem, but it doesn’t solve the OS problem, and it doesn’t ship the compiled binaries themselves in a way that survives across systems unchanged.

Containers solve this. Docker and Singularity ship your entire analysis environment as a sealed, reproducible unit. The code, the tools, the dependencies, the OS layer — all of it runs identically everywhere: your laptop, a collaborator’s MacBook, an HPC cluster, AWS, or a cloud notebook. This is the final step in reproducibility, and it’s not optional for serious bioinformatics work anymore.

This guide walks you through Docker for local development and Singularity (also called Apptainer) for HPC clusters. You’ll build working containers, run real bioinformatics tools, integrate them with Nextflow pipelines, and troubleshoot the most common errors. By the end, you will have the skills to containerize your own analyses.

Prerequisites

You should be comfortable at the command line and have basic familiarity with installing software using Conda or Mamba. Docker and Singularity are tools, not programming languages, so there’s no prior coding experience needed. If you haven’t worked with containers before, that’s fine — this guide starts from zero.

To follow along, you’ll need:

A computer (Linux, macOS, or Windows with WSL2) or access to an HPC cluster
10 GB of free disk space for container images
Administrator access to install Docker (local machine only)

For HPC work, you’ll need Singularity or Apptainer already installed on your cluster. If your cluster doesn’t have it, talk to your systems administrator.

Why Not Just Conda?

Conda solved an enormous problem: managing Python and R package dependencies across platforms. But Conda has real limits for reproducibility.

What Conda does well: It packages interpreted languages (Python, R) and their dependencies, and it does it well. A Conda environment is reproducible across macOS, Linux, and Windows as long as the same package versions exist.

What Conda misses:

Compiled binaries aren’t OS-agnostic. Tools like STAR, BWA, or SAMtools are compiled C/C++ binaries. They’re compiled against your OS kernel and system libraries. Conda works around this by recompiling on install, but that only works if you have a compatible compiler.
System library versions matter. Your system might have an old version of libssl or libc that the tool wasn’t compiled against. Conda doesn’t manage these system-level dependencies perfectly.
It’s still the host OS underneath. Conda creates isolated environments, but your code still runs on the host operating system. If you’re on Ubuntu and your collaborator is on CentOS, underlying library differences can surface.
GPU and hardware drivers. If your pipeline uses CUDA or GPU-accelerated tools, the host system has to have compatible drivers. Conda can’t ship drivers in the environment itself.

Containers fix all of these by shipping the entire OS layer, not just the application layer. Your code runs in the container’s Linux environment regardless of whether you’re on macOS, Windows, or a different Linux distribution. This is total reproducibility.

Docker Fundamentals

Installing Docker

On macOS and Windows, use Docker Desktop:

Download Docker Desktop from the official Docker website.
Follow the installer and start the Docker daemon.
Verify installation:

docker --version

On Linux (Ubuntu/Debian):

sudo apt update
sudo apt install docker.io docker-compose -y
sudo usermod -aG docker $USER
newgrp docker

After adding your user to the docker group, log out and back in (or restart your terminal) for changes to take effect.

Verify:

docker run hello-world

This downloads a tiny test image and runs it. If you see Hello from Docker!, you’re good.

Running Your First Bioinformatics Container

The BioContainers project maintains Docker images for thousands of bioinformatics tools. These are stable, peer-reviewed, and optimized for research.

Let’s run FastQC on a test FASTQ file:

docker pull biocontainers/fastqc:v0.12.1--hdfd78af_0

This downloads a container image with FastQC pre-installed. The tag v0.12.1--hdfd78af_0 is the exact version, guaranteeing reproducibility.

Create a test directory and a dummy FASTQ file:

mkdir -p ~/fastq_data
cd ~/fastq_data
cat > test.fastq << 'EOF'
@read1
ACGTACGTACGTACGT
+
IIIIIIIIIIIIIIII
@read2
ACGTACGTACGTACGT
+
IIIIIIIIIIIIIIII
EOF

Now run FastQC inside the container, mounting your local directory:

docker run --rm \
  -v ~/fastq_data:/data \
  biocontainers/fastqc:v0.12.1--hdfd78af_0 \
  fastqc /data/test.fastq -o /data/

Let’s break down this command:

docker run: Execute a container image
--rm: Remove the container after it exits (clean up)
-v ~/fastq_data:/data: Mount your local directory to /data inside the container (volume mounting)
biocontainers/fastqc:v0.12.1--hdfd78af_0: The image to run
fastqc /data/test.fastq -o /data/: The command to execute inside the container

After FastQC finishes, you’ll see the output HTML file in ~/fastq_data/. Check:

ls ~/fastq_data/

You should see test_fastqc.html and a test_fastqc.zip. The file was created inside the container but written to your mounted directory, so it exists on your host filesystem.

Building Your Own Dockerfile

BioContainers has thousands of images, but eventually you need to customize. Maybe you want FastQC, Trim Galore, and STAR all in one container. Or you need a specific version of an R package that isn’t in the base image.

A Dockerfile is a recipe for building a container. Here’s a simple example for an RNA-seq environment:

# Start from a base image with Conda pre-installed
FROM mambaorg/micromamba:latest

# Set a working directory
WORKDIR /work

# Copy a Conda environment file (optional, we'll define it inline here)
# Install bioinformatics tools
RUN micromamba install -c bioconda -c conda-forge -y \
    fastqc=0.12.1 \
    trim-galore=0.6.10 \
    star=2.7.11b \
    samtools=1.18 \
    subread=2.0.3 && \
    micromamba clean --all -y

# Install R and DESeq2
RUN micromamba install -c bioconda -c conda-forge -y \
    r-base=4.3.0 \
    bioconductor-deseq2=1.40.0 && \
    micromamba clean --all -y

# Set the entrypoint (the default command when the container starts)
ENTRYPOINT ["/bin/bash"]

# Label for documentation
LABEL maintainer="your.email@example.com"
LABEL description="RNA-seq analysis environment with FastQC, Trim Galore, STAR, and DESeq2"

Save this as Dockerfile (no extension):

mkdir -p ~/rnaseq_container
cd ~/rnaseq_container
cat > Dockerfile << 'EOF'
FROM mambaorg/micromamba:latest

WORKDIR /work

RUN micromamba install -c bioconda -c conda-forge -y \
    fastqc=0.12.1 \
    trim-galore=0.6.10 \
    star=2.7.11b \
    samtools=1.18 \
    subread=2.0.3 && \
    micromamba clean --all -y

RUN micromamba install -c bioconda -c conda-forge -y \
    r-base=4.3.0 \
    bioconductor-deseq2=1.40.0 && \
    micromamba clean --all -y

ENTRYPOINT ["/bin/bash"]

LABEL maintainer="your.email@example.com"
LABEL description="RNA-seq analysis environment"
EOF

Build the image:

docker build -t my-rnaseq:latest .

The . tells Docker to look for the Dockerfile in the current directory. -t tags the image with a name you can reference later. The build process will take several minutes as it downloads the base image and installs all tools.

Once complete, verify:

docker images

You should see my-rnaseq listed with the latest tag.

Run a command in your custom container:

docker run --rm my-rnaseq fastqc --version

This should output: FastQC v0.12.1.

Best Practices for Dockerfiles

Keep these principles in mind when building production Dockerfiles:

Pin versions explicitly. Never use fastqc=*. Specify fastqc=0.12.1. Floating versions will change when you rebuild, breaking reproducibility.
Use official, trusted base images. mambaorg/micromamba is maintained by the Mamba team. BioContainers images are peer-reviewed. Avoid unknown Docker Hub images.
Minimize layers. Each RUN command creates a new layer in the image. Combine them with && to keep the image lean:

# Good
RUN apt update && apt install -y tool1 tool2 && rm -rf /var/apt/lists/*

# Inefficient
RUN apt update
RUN apt install -y tool1
RUN apt install -y tool2

Clean up after installs. Remove package manager caches to shrink the image:

RUN conda install ... && conda clean --all -y

Document with labels. Future you (and collaborators) will want to know the image’s purpose, maintainer, and version:

LABEL version="1.0"
LABEL maintainer="your.email@example.com"
LABEL description="What this image does"

Singularity / Apptainer for HPC Clusters

Why HPC Clusters Use Singularity

HPC clusters don’t allow Docker. Here’s why:

Docker requires root privileges to run securely. On a shared cluster with hundreds of users, giving everyone root access is a security disaster. A user could escape the container and access other users’ data.

Singularity (now called Apptainer under open-source development) solves this by design. Singularity containers are rootless. You can build them on your laptop (or have someone with root build for you), then run them on a cluster without any privilege escalation. Your cluster administrator is happy. Your data is safe.

Converting a Docker Image to Singularity

You don’t have to choose between Docker and Singularity. You can build your container in Docker, then convert it for use on your cluster.

First, verify Singularity is installed:

singularity --version

On your local machine (where Docker is running), pull a Docker image and convert it to Singularity:

singularity pull docker://biocontainers/fastqc:v0.12.1--hdfd78af_0

This creates a .sif file (Singularity Image Format):

ls *.sif

You should see fastqc_v0.12.1--hdfd78af_0.sif. This is a single file containing the entire container. Copy it to your cluster:

scp fastqc_v0.12.1--hdfd78af_0.sif username@cluster.university.edu:/home/username/containers/

On the cluster, run it:

singularity exec ~/containers/fastqc_v0.12.1--hdfd78af_0.sif fastqc --version

That’s it. No Docker, no root, no permission issues.

Building a Custom Singularity Image from a Dockerfile

You can convert your custom Docker image directly to Singularity:

singularity pull docker-daemon://my-rnaseq:latest

Note: This requires the Docker daemon to be running locally. After conversion, you get my-rnaseq_latest.sif.

Alternatively, you can write a Singularity Definition file directly (without Docker). Here’s the equivalent of the RNA-seq Dockerfile above:

Bootstrap: docker
From: mambaorg/micromamba:latest

%labels
    Author "Your Name"
    Version 1.0
    Description "RNA-seq analysis environment"

%post
    micromamba install -c bioconda -c conda-forge -y \
        fastqc=0.12.1 \
        trim-galore=0.6.10 \
        star=2.7.11b \
        samtools=1.18 \
        subread=2.0.3

    micromamba install -c bioconda -c conda-forge -y \
        r-base=4.3.0 \
        bioconductor-deseq2=1.40.0

    micromamba clean --all -y

%environment
    export LC_ALL=C

%runscript
    /bin/bash

Save as rnaseq.def. Build on your local machine:

sudo singularity build rnaseq.sif rnaseq.def

Building a Singularity image requires root (hence sudo). Copy the resulting .sif to your cluster and run commands as above.

Running Singularity Containers in HPC Job Scripts

Most HPC clusters use SLURM for job scheduling. Here’s a realistic example that runs FastQC on 1000 FASTQ files:

#!/bin/bash
#SBATCH --job-name=fastqc_batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=02:00:00
#SBATCH --array=0-999

# Load Singularity module (cluster-specific, ask your admin)
module load singularity

# Define paths
CONTAINER=/home/username/containers/fastqc_v0.12.1--hdfd78af_0.sif
FASTQ_DIR=/group/shared/fastq/
OUTPUT_DIR=/home/username/fastqc_results/

# Create output directory if it doesn't exist
mkdir -p $OUTPUT_DIR

# Get the FASTQ file for this array job
FASTQ_FILES=($FASTQ_DIR/*.fastq)
FASTQ="${FASTQ_FILES[$SLURM_ARRAY_TASK_ID]}"

# Run FastQC inside the container
singularity exec \
    -B $FASTQ_DIR:/data/input \
    -B $OUTPUT_DIR:/data/output \
    $CONTAINER \
    fastqc /data/input/$(basename $FASTQ) -o /data/output/

Key points:

#SBATCH --array=0-999: Run 1000 jobs in parallel (one per FASTQ file)
singularity exec: Execute a command inside the container
-B: Bind (mount) directories. -B $FASTQ_DIR:/data/input mounts your cluster’s FASTQ directory as /data/input inside the container
The container reads from and writes to the mounted paths, which persist on the cluster

Submit the job:

sbatch fastqc_batch.sh

Check status:

squeue -u username

Handling Singularity Bind Mounts

Bind mounts are the counterpart to Docker’s -v. They’re essential for reading input files and writing output.

singularity exec \
    -B /local/data:/data,/local/results:/results \
    container.sif \
    my_tool /data/input.txt -o /results/output.txt

The -B flag syntax is:

-B /source:/destination (or -B /source to bind at the same path inside)
Multiple binds: -B path1:/in1,path2:/in2

Common error: You bind a directory but the container can’t write to it. Check file permissions on the host system. The container runs as your user, so you need read/write permissions on the mounted directory.

Using Containers with Nextflow Pipelines

If you’re building Nextflow pipelines (and you should be), containers are the natural way to specify per-process environments.

Quick Recap: Why Nextflow and Containers Together

Nextflow is a workflow language that handles parallelization, error handling, and reproducibility. Containers specify the tool environments. Together, they’re unbeatable: Nextflow makes the workflow portable, containers make the tools portable.

Docker in Nextflow

Tell Nextflow to use Docker for each process:

process fastqc {
    container 'biocontainers/fastqc:v0.12.1--hdfd78af_0'

    input:
    file(fastq)

    output:
    file("*_fastqc.zip")

    script:
    """
    fastqc ${fastq}
    """
}

In your Nextflow config file (nextflow.config):

docker.enabled = true
docker.runOptions = '-u $(id -u):$(id -g)'

When you run nextflow run, each process automatically pulls the specified container and runs inside it.

Singularity in Nextflow

On HPC clusters, use Singularity instead:

singularity.enabled = true
singularity.autoMounts = true

In your process definitions, the container directive works identically:

process fastqc {
    container 'biocontainers/fastqc:v0.12.1--hdfd78af_0'

    // Nextflow will pull/convert to .sif automatically

    input:
    file(fastq)

    output:
    file("*_fastqc.zip")

    script:
    """
    fastqc ${fastq}
    """
}

Nextflow will pull the Docker image and convert it to Singularity on first run. After that, it uses the cached .sif file. The conversion happens transparently.

For more details on integrating containers with Nextflow, see our guide on running your first Nextflow pipeline — it covers container configuration in depth.

Common Errors and Troubleshooting

Error: “Cannot connect to Docker daemon”

Symptom: docker run fails with “Cannot connect to Docker daemon.”

Cause: The Docker daemon isn’t running.

Solution:

On macOS: Open Docker Desktop and wait for it to start (check the menu bar icon).
On Linux: sudo systemctl start docker
If you added your user to the docker group but still see this error, restart your terminal session or run newgrp docker.

Error: “Permission denied” on Mounted Directories

Symptom: Inside a container, you try to write to a mounted directory and get “Permission denied.”

Cause: The host directory doesn’t have write permissions for your user, or the container is running as a different user.

Solution: Check permissions on the host:

ls -ld ~/fastq_data

If you don’t own it or don’t have write permission, fix it:

chmod 755 ~/fastq_data

Or, in Docker, explicitly set the user:

docker run --rm -u $(id -u):$(id -g) \
    -v ~/fastq_data:/data \
    biocontainers/fastqc:v0.12.1--hdfd78af_0 \
    fastqc /data/test.fastq -o /data/

In Singularity, the container automatically runs as your user, so this is less common. If it still happens, check cluster file system permissions (you may have a quota issue or the directory may be on a read-only mount).

Error: “Image not found”

Symptom: docker pull or singularity pull fails saying the image doesn’t exist.

Cause: Typo in the image name, version tag, or registry.

Solution: Double-check the full image name, including the registry and tag. For example:

# Correct
docker pull biocontainers/fastqc:v0.12.1--hdfd78af_0

# Common mistake (missing tag, pulls 'latest' which may not exist)
docker pull biocontainers/fastqc

# Another mistake (wrong registry)
docker pull fastqc:v0.12.1  # this looks in docker.io, not biocontainers

For BioContainers images, browse quay.io/biocontainers to find exact tag names.

Error: “No space left on device”

Symptom: Docker or Singularity operations fail with “No space left on device.”

Cause: Your disk is full, or Docker’s image storage is full.

Solution: Check disk usage:

df -h

Docker stores images in /var/lib/docker/ (Linux) or ~/Library/Containers/com.docker.docker/ (macOS). Clean up old images:

docker system prune -a

This removes unused images, containers, and networks. Use -a cautiously on production machines; it’s aggressive.

For Singularity, cached .sif files accumulate in ~/.singularity/cache/. Clean old ones:

rm ~/.singularity/cache/oras-*.sif
singularity cache clean

Error: “Mount point does not exist”

Symptom: Singularity fails with “Mount point … does not exist” or “Not found: /some/path inside the container.”

Cause: The directory inside the container doesn’t exist, or the bind mount path is wrong.

Solution: Check what’s inside the container:

singularity shell container.sif
ls /work  # or whatever path you tried to mount to

If the path doesn’t exist, create it in the container definition or Dockerfile before binding. Or bind to a path that does exist. Most containers include /tmp, /home, and /work by default — bind there.

Error: Module Not Found in Python/R

Symptom: A Python or R script inside a container fails with “ModuleNotFoundError” or “library(package) not found”.

Cause: The package wasn’t installed in the container when it was built.

Solution: Rebuild the container with the package included. Update your Dockerfile or .def file:

RUN micromamba install -c bioconda -c conda-forge -y \
    your-missing-package

Then rebuild:

docker build -t my-image:v1.1 .

Next Steps

If you want to fully understand Docker beyond bioinformatics use cases — networking, volumes, Docker Compose, and production patterns — Nigel Poulton’s Docker Deep Dive is the fastest way to get there. It’s concise, continuously updated for recent Docker releases, and rated the top Docker reference on BookAuthority. Worth keeping as a reference even after you’re comfortable with the basics.

You now have the skills to containerize your bioinformatics workflows. Here’s what comes next:

For local development: Start with Docker. Create Dockerfiles for your custom environments. Use BioContainers images for standard tools. Version your images clearly (tag them with dates or version numbers, never just latest).

For HPC work: Convert your Docker images to Singularity. Integrate them into SLURM job scripts. Once you’ve done this once, you’ll never struggle with dependency hell on a cluster again.

For reproducible pipelines: Pair containers with Nextflow. If you haven’t used Nextflow yet, read our guide on running your first Nextflow pipeline. Nextflow + containers is the gold standard for reproducible, shareable bioinformatics.

For environment management: Before you containerize, set up your development environment reproducibly with Conda or Mamba. This makes writing the Dockerfile easier and keeps your local work organized.

One more thing: documentation. When you build a container, document it:

What does it do?
What version of each tool is included?
How do you run it?
What input files does it expect?
What’s the output format?

Include this in the Dockerfile’s LABEL directives or in a README.md alongside your Dockerfile. Future you will be grateful.

Containers aren’t optional anymore. They’re how reproducible science gets done. The investment in learning them now will compound across every project you work on from here forward.