Git and GitHub for Bioinformatics: A Practical Guide

Learn version control for reproducible research. Git and GitHub for PhD students and postdocs, from zero to collaborating on bioinformatics projects.

Introduction: The Problem Everyone Faces

You’re three months into your analysis. You run a script called analysis_FINAL.R, get results, and move forward. Then your advisor asks you to tweak one parameter and rerun everything. But which version of the script produced those figures? You have analysis_v2.R, analysis_v2_new.R, and analysis_FINAL_really_final.R in your directory, and you can’t remember what changed between them.

Or worse: you emailed yourself the script yesterday as a backup, lost a function by accident, and now you’re hunting through your sent folder to reconstruct 20 lines of code.

This is what version control solves. Git is a tool that tracks every change you make to your code, who made it, and why. GitHub is a platform where you store that history online and collaborate with others. Together, they’ve become essential infrastructure in biology. As of 2026, publishing an analysis without a linked GitHub repository is increasingly rare in top-tier journals, and many funding agencies now expect reproducibility through version-controlled code.

The good news: you don’t need to be a software engineer to use Git. This guide walks you through it step by step, assuming zero prior experience. By the end, you’ll have your first repository set up and pushed to GitHub, ready for real analysis work.

Why Git Matters for Reproducible Science

Reproducibility is the foundation of scientific integrity, and version control is how you practice it with code. When you use Git, every change is timestamped, linked to a commit message explaining why you changed it, and attributed to you. If someone reviews your published work and asks why you filtered out reads with low quality scores, you can show them the exact commit where that decision was made, the code before and after, and your reasoning.

Git also prevents the “file explosion” problem. Instead of analysis_v1.R, analysis_v2.R, analysis_backup.R, you have one file and a complete history of every version. You can jump back to any point in time without losing anything.

Collaboration becomes easier too. If you’re working with someone in another lab on a shared pipeline, Git lets you both make changes, merge them, and resolve conflicts without accidentally overwriting each other’s work. And if something breaks in your pipeline on Tuesday, you can revert to the version from Monday and figure out what went wrong in isolation.

Finally, when you publish, linking your GitHub repository alongside your paper is now table stakes. Reviewers can inspect your code. Users can download it and reproduce your figures. Other researchers can build on your work. Version control makes all of this possible.

Core Concepts, Explained Plainly

Git has a few foundational ideas. You only need to understand these five terms to get started.

TermWhat It MeansAnalogy
RepositoryA folder that Git is tracking. Contains all your files and the complete history of changes.A project folder with an invisible time machine built in.
CommitA snapshot of your code at a point in time. You choose when to take snapshots, and you write a message explaining what changed.A saved game file. You decide when to hit “save,” and you describe what you did since the last save.
BranchA parallel copy of your work where you can make changes without affecting the main code.A parallel universe. You can experiment here without breaking the original.
PushUpload your commits from your computer to GitHub.Syncing your game saves to the cloud.
PullDownload commits from GitHub to your computer.Downloading someone else’s game saves to see their progress.

That’s really it. Everything else builds on these ideas.

Installation and Setup

Installing Git

On macOS, open Terminal and run:

brew install git

If you don’t have Homebrew, install it first from brew.sh.

On Linux (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install git

On Windows, download the installer from git-scm.com, run it, and accept the defaults.

To confirm Git is installed:

git --version

Configure Git with Your Name and Email

Git needs to know who you are for commit attribution. Run these commands once, replacing with your details:

git config --global user.name "Jane Smith"
git config --global user.email "jane.smith@institution.edu"

Check that it worked:

git config --global --list

Generate an SSH Key and Connect to GitHub

When you push to GitHub, you need to authenticate without typing your password every time. SSH keys handle this securely.

Generate a key:

ssh-keygen -t ed25519 -C "jane.smith@institution.edu"

Press Enter to accept the default file location. When prompted for a passphrase, you can leave it blank or add one for extra security. Then display your public key:

cat ~/.ssh/id_ed25519.pub

Copy the output. Go to GitHub, log in, click your profile photo, select Settings, then SSH and GPG keys, and click New SSH key. Paste your key there.

Test the connection:

ssh -T git@github.com

You should see “Hi [your-username]! You’ve successfully authenticated.”

Your First Repository

Create a Local Repository

Navigate to a project folder or create one:

mkdir my_rna_analysis
cd my_rna_analysis

Initialize Git:

git init

This creates a hidden .git folder that tracks everything. You’re done, your repository exists locally.

Add Your First File

Create a simple file:

echo "# RNA-seq Analysis for Study X" > README.md

Tell Git about it:

git add README.md

Take a snapshot (commit):

git commit -m "Initial commit: add README"

The -m flag lets you write your message inline. Commit messages are crucial. Write present-tense, be specific (“Add normalization step” not “fixed stuff”). Check your work:

git log

You’ll see your commit with a timestamp and message.

Push to GitHub

On GitHub, click the + icon and select New repository. Name it my_rna_analysis, do NOT initialize with README (you already have one locally), and click Create repository.

You’ll see instructions. Run these commands in your terminal:

git branch -M main
git remote add origin git@github.com:[your-username]/my_rna_analysis.git
git push -u origin main

Replace [your-username] with your actual GitHub username. Your code is now on GitHub. You can verify by visiting your repository URL in the browser.

A Bioinformatics-Specific Workflow

Let’s walk through a realistic scenario: you’re starting a bulk RNA-seq analysis. Your project has reference genomes, raw reads, scripts, and results. Most of these are large; you don’t want Git tracking them.

Set Up .gitignore

Create a .gitignore file in your project root:

cat > .gitignore << 'EOF'
# Large data files
*.bam
*.bai
*.fastq
*.fastq.gz
*.fq
*.fq.gz
*.vcf
*.vcf.gz

# Large results
results/
counts/
alignments/

# System files
.DS_Store
*.Rhistory
.RData

# Temporary files
*.tmp
*.log
EOF

Add and commit it:

git add .gitignore
git commit -m "Add .gitignore for large files and system files"

Create Your Analysis Structure

mkdir scripts results data
touch scripts/align.sh scripts/count.R scripts/deseq2.R

Write some initial code to each script. For example:

cat > scripts/align.sh << 'EOF'
#!/bin/bash
# Align reads using STAR
# This script aligns paired-end RNA-seq reads to GRCh38

FASTQ_DIR="data/fastq"
GENOME_INDEX="data/genome_index"
RESULTS_DIR="results/bam"

# Add alignment commands here
EOF

Commit Incrementally

As you refine each step, commit:

git add scripts/align.sh
git commit -m "Add initial STAR alignment script"

Later, you optimize the alignment parameters:

# Edit scripts/align.sh
git add scripts/align.sh
git commit -m "Optimize STAR parameters: increase genomeChrBinNbits for large reads"

Then you add the counting script:

git add scripts/count.R
git commit -m "Add featureCounts wrapper script for transcript quantification"

Each commit is a waypoint. If you’re working on this for weeks, you’ll have dozens. This history is invaluable when you need to explain your methodology or trace when a bug was introduced.

Push your progress to GitHub regularly:

git push

Branches for Exploratory Analysis

Suppose you want to test a different alignment tool, maybe Bowtie2 instead of STAR, to compare. You don’t want to break your main code. Create a branch:

git branch test-bowtie2

Switch to it:

git checkout test-bowtie2

Or do both in one command:

git checkout -b test-bowtie2

Now you’re in a parallel universe. Edit your alignment script, add new scripts, commit freely:

# Edit scripts/align.sh to use Bowtie2 instead of STAR
git add scripts/align.sh
git commit -m "Test Bowtie2 alignment on sample data"

# Run your analysis and compare results
# ...

# Run the comparison
bash scripts/align.sh

If the results are better, merge the branch back to main:

git checkout main
git merge test-bowtie2

If they’re worse, delete the branch without merging:

git branch -D test-bowtie2

The main branch is safe. You can experiment without risk.

Common Mistakes and How to Fix Them

Oops: I Committed My Password

Never commit passwords, API keys, or credential files. If you did by accident, delete the file immediately and rewrite Git history:

git rm --cached config/secrets.txt
git commit --amend -m "Remove secrets file from history"

For critical systems, rotate the password or key that was exposed.

Oops: I Committed a 2GB BAM File

You told Git to track it, and now the repository is huge. Remove it from Git history:

git rm --cached large_file.bam
echo "large_file.bam" >> .gitignore
git add .gitignore
git commit -m "Stop tracking large BAM file; add to .gitignore"

The file is gone from Git but still on your disk. The repository is smaller. (If you already pushed, it’s more involved, but the same principle applies.)

Oops: I Have Merge Conflicts

You’re on a branch, you commit changes to script.R on lines 10-15. Someone else commits changes to lines 10-15 on main. You try to merge:

git merge main

Git stops and says there’s a conflict. Open the file:

cat scripts/script.R

You’ll see something like:

<<<<<<< HEAD
your version of lines 10-15
=======
their version of lines 10-15
>>>>>>> main

Edit the file to keep the version you want (or combine both), then remove the conflict markers. Add and commit:

git add scripts/script.R
git commit -m "Resolve merge conflict in script.R"

Most merge conflicts are straightforward. When in doubt, talk to your collaborator about which version is correct.

Next Steps and Verdict

If you want to go beyond the basics and understand Git deeply — branching strategies, rebasing, submodules, hooks, and the object model underneath it all — Scott Chacon and Ben Straub’s Pro Git is the definitive reference. It’s also free online, but the print edition is useful to have open on your desk when you’re working through a tricky rebase or bisect operation.

You now have everything you need to version-control a small bioinformatics project. As you grow, you’ll want to learn more:

GitHub Actions automate your pipeline runs. Every time you push, GitHub can automatically run your Nextflow pipeline, your alignment script, or your statistical tests. Link this to continuous deployment and you’ve built a reproducible pipeline that runs on every commit.

Zenodo lets you archive your repository alongside your published paper. It assigns a DOI so people can cite your exact code version. Many journals now expect this for methods transparency.

Collaborative workflows like pull requests turn GitHub into a code review platform. Instead of merging your own branches, you request review, discuss changes, and merge only after approval.

My verdict: Use Git from day one of every analysis project. You won’t regret it. The overhead of git add, git commit, and git push is minutes per project, and the benefit, when someone asks you to reproduce a figure or explain a decision two years later, is immeasurable. Start with the basics here, and the rest will follow naturally as you encounter them.

If you’re setting up your first analysis pipeline, you might also find our guide on how to set up a reproducible bioinformatics environment with Conda and Mamba helpful. And once your pipeline is in Git, our tutorial on running your first Nextflow pipeline shows how to automate the workflows you’ve version-controlled.

Primary CTA