R for Wet Lab Data Analysis: A Practical Guide for Non-Programmers

You don’t need to be a programmer to use R. You need to know about 15 commands to handle 90% of wet lab data analysis. This tutorial covers those commands and gets you from an Excel spreadsheet to a publication-quality figure. This is not a “learn to code” course. It’s a practical tutorial for scientists who work with qPCR data, ELISA results, microscopy quantification, flow cytometry statistics, cell viability assays, and similar measurements.

Installation and Setup

Install R and RStudio

Go to cran.r-project.org and download R for your operating system (Windows, macOS, or Linux). Run the installer with default settings.

Then go to posit.co and download RStudio (Desktop version, free). Install it. RStudio is an interface that makes R much easier to use.

Orienting to RStudio

When you open RStudio, you’ll see four panels:

Console (bottom-left). This is where commands run and output appears
Script Editor (top-left). Write scripts here and run them. You can save your analysis as a .R file
Environment (top-right). Shows variables and data you’ve loaded
Plots/Help (bottom-right). Shows figures you create and help documentation

You can type commands directly in the Console, but it’s better to write in the Script Editor so you have a record of what you did.

Install the Packages You’ll Need

Open the Console and type this exactly:

install.packages(c("tidyverse", "ggplot2", "readxl", "dplyr"))

Press Enter. R will download and install these packages. This takes about a minute. You only install packages once.

Now, in your Script Editor, type:

library(readxl)
library(dplyr)
library(ggplot2)

Press Ctrl+Enter (or Cmd+Enter on Mac) to run these lines. This loads the packages. You need to do this every time you start a new R session.

Getting Your Data Into R

Your wet lab data is probably in Excel. Here’s how to load it:

From Excel

Suppose you have a file called “my_data.xlsx” with one sheet containing your data. In your Script Editor, type:

library(readxl)
data <- read_excel("my_data.xlsx", sheet = 1)

Replace the filename with your actual filename. The data is now in a variable called “data”. Press Ctrl+Enter to run.

If your data is on a different sheet (e.g., the second sheet), change sheet = 1 to sheet = 2.

From CSV

If your data is a CSV file:

data <- read.csv("my_data.csv")

Examining Your Data

Now that data is loaded, look at it:

head(data)

This shows the first six rows. You’ll see column names and values. Check that the columns are what you expect.

str(data)

This shows the structure: column names, data types (numbers, text, etc.), and sample values. You’re checking that numeric columns are recognized as numbers, not text.

summary(data)

This gives summary statistics (mean, median, min, max) for each numeric column. Scan for obvious errors (e.g., a cell viability column showing values above 100 or below 0).

Common Issues and How to Fix Them

Issue: Column names with spaces. If your Excel file has a column called “Cell Count”, R will have trouble because of the space. Solution: Rename columns in Excel to use underscores instead (“Cell_Count”) before importing.

Issue: Missing values showing as “NA”. If your Excel sheet has blank cells, R imports them as NA (Not Available). This is fine; R handles NAs. If you want to see how many NAs you have, use sum(is.na(data$ColumnName)) to count missing values in a specific column.

Issue: Extra header rows in Excel. If your Excel file has title rows at the top before the actual headers, specify which row contains column names: read_excel("my_data.xlsx", skip = 2) skips the first 2 rows.

Basic Data Manipulation

You now have data loaded. Most analyses require filtering (keeping only certain rows) and creating new columns (like calculating fold change). Here are the commands you need:

Filter Rows: Select Only Certain Samples

Suppose your data has a “Group” column with values “Control” and “Treatment”. You want only the Treatment group:

library(dplyr)
treatment_data <- filter(data, Group == "Treatment")

This creates a new data frame called “treatment_data” containing only rows where Group equals “Treatment”. Note the double equals sign (==); this means “is equal to”.

More examples:

# Keep rows where Value is greater than 10
high_values <- filter(data, Value > 10)

# Keep rows where Group is Treatment AND Value is greater than 5
subset <- filter(data, Group == "Treatment" & Value > 5)

# Keep rows where Value is not equal to 0
nonzero <- filter(data, Value != 0)

Select Columns: Keep Only Certain Columns

If your data has 20 columns and you only need 3:

library(dplyr)
small_data <- select(data, Group, Value, SampleID)

This creates a new data frame with only those three columns.

Create New Columns: Calculate Derived Values

Most analyses require normalizing data to a control. Here’s how:

# Calculate the mean value of the Control group
control_mean <- mean(data$Value[data$Group == "Control"])

# Create a new column with fold-change relative to that control
data$FoldChange <- data$Value / control_mean

Now “data” has a new column called “FoldChange” where each value is divided by the control mean.

More examples:

# Create a log-transformed column
data$log_Value <- log10(data$Value)

# Create a column with centered values (subtract the mean)
data$Centered <- data$Value - mean(data$Value)

# Create a column that categorizes values as "High" or "Low"
data$Category <- ifelse(data$Value > median(data$Value), "High", "Low")

Summary Statistics: Mean, SEM, and SD by Group

Now you have clean, manipulated data. Extract summary statistics by group:

library(dplyr)

summary_stats <- data %>%
  group_by(Group) %>%
  summarize(
    Mean = mean(Value),
    SD = sd(Value),
    SEM = sd(Value) / sqrt(n()),
    N = n()
  )

Let me break this down:

data %>% means “take the data and then…”
group_by(Group) means “for each unique value in the Group column…”
summarize(...) means “calculate these statistics”
mean(Value) calculates the mean of the Value column
sd(Value) calculates the standard deviation
sd(Value) / sqrt(n()) calculates the standard error of the mean (SEM)
n() counts how many observations are in each group

The result is a small table showing Mean, SD, SEM, and N for each group. This is what you’ll plot.

Making a Bar Graph with Error Bars in ggplot2

Now you have summary statistics. Plot them:

library(ggplot2)

ggplot(summary_stats, aes(x = Group, y = Mean, fill = Group)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_errorbar(aes(ymin = Mean - SEM, ymax = Mean + SEM), width = 0.2) +
  theme_classic() +
  labs(x = "Treatment Group", y = "Mean Value ± SEM") +
  theme(legend.position = "none")

Again, breaking this down layer by layer:

ggplot(summary_stats, aes(x = Group, y = Mean, fill = Group)) creates the base plot. “aes” means “aesthetics.” You’re saying: x-axis is Group, y-axis is Mean, and fill each bar by Group (so each group gets a different color).

geom_bar(stat = "identity", width = 0.6) draws bars. stat = "identity" means “use the values as-is, don’t count or summarize.” width = 0.6 makes bars 60% of the space (leaving gaps between them).

geom_errorbar(...) adds error bars. ymin = Mean - SEM and ymax = Mean + SEM set the top and bottom of each error bar.

theme_classic() uses a clean, publication-ready style (no grid, minimal decorations).

labs(x = "Treatment Group", y = "Mean Value ± SEM") labels the axes.

theme(legend.position = "none") removes the legend (since the colors already indicate Group, the legend is redundant).

Run this code, and a figure appears in the Plots panel (bottom-right). Perfect for a manuscript.

Customizing Your Plot

Change the colors:

ggplot(summary_stats, aes(x = Group, y = Mean, fill = Group)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_errorbar(aes(ymin = Mean - SEM, ymax = Mean + SEM), width = 0.2) +
  scale_fill_manual(values = c("Control" = "#0173B2", "Treatment" = "#DE8F05")) +
  theme_classic() +
  labs(x = "Treatment Group", y = "Mean Value ± SEM") +
  theme(legend.position = "none")

The colors are hex codes: #0173B2 is blue, #DE8F05 is orange. These are from the Okabe-Ito colorblind-safe palette. Change these to any hex color you prefer.

Change axis limits:

ggplot(summary_stats, aes(x = Group, y = Mean, fill = Group)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_errorbar(aes(ymin = Mean - SEM, ymax = Mean + SEM), width = 0.2) +
  ylim(0, 100) +
  theme_classic() +
  labs(x = "Treatment Group", y = "Mean Value ± SEM") +
  theme(legend.position = "none")

ylim(0, 100) sets the y-axis to go from 0 to 100. Adjust to match your data.

Basic Statistical Tests in R

Two-Group T-Test

Comparing Control vs. Treatment:

t.test(Value ~ Group, data = data, var.equal = FALSE)

This tests whether the mean Value differs between the two groups in Group. var.equal = FALSE uses Welch’s t-test (which doesn’t assume equal variances; this is safer). Press Ctrl+Enter and read the output. The p-value appears as “p-value = 0.0034” or similar.

The formula notation Value ~ Group means “Value as a function of Group.” This is standard in R.

One-Way ANOVA (3+ Groups)

If you have Control, Treatment1, and Treatment2:

model <- aov(Value ~ Group, data = data)
summary(model)

This fits an ANOVA model and prints a table. Look for the p-value in the “Pr(>F)” column. If p < 0.05, there’s a significant difference somewhere among your groups.

Post-Hoc Test: Which Groups Differ?

ANOVA tells you whether groups differ overall, but not which specific groups differ. Use Tukey’s HSD test:

TukeyHSD(model)

This shows pairwise comparisons (Control vs. Treatment1, Control vs. Treatment2, Treatment1 vs. Treatment2) with p-values for each pair. Any p-value less than 0.05 indicates a significant difference between those two groups.

Adding Significance Brackets to Plots

Once you’ve calculated p-values, you might want to show them on your figure. This requires the ggpubr package:

install.packages("ggpubr")
library(ggpubr)

Then modify your plot:

ggplot(summary_stats, aes(x = Group, y = Mean, fill = Group)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_errorbar(aes(ymin = Mean - SEM, ymax = Mean + SEM), width = 0.2) +
  theme_classic() +
  labs(x = "Treatment Group", y = "Mean Value ± SEM") +
  theme(legend.position = "none") +
  stat_compare_means(comparisons = list(c("Control", "Treatment")),
                     method = "t.test",
                     label = "p.format")

This adds a bracket between Control and Treatment with the p-value displayed. For more complex comparisons (ANOVA with multiple groups), ggpubr’s documentation has examples.

Saving Your Figures: Publication Quality

Export your figure at high resolution for a journal:

ggsave("figure1.pdf", width = 3.5, height = 4, units = "in")
ggsave("figure1.tiff", width = 3.5, height = 4, units = "in", dpi = 300)

The first line saves as PDF (good for vector graphics, which scale without pixelation). The second saves as TIFF at 300 DPI, which most journals accept.

Adjust width and height to match your journal’s requirements. Most journals want single-column figures around 3.5 inches wide.

Both files appear in your working directory (the folder where your R script is saved). You can now insert these into your manuscript.

Getting Unstuck: Reading Error Messages and Finding Help

R error messages look cryptic at first but contain useful information. Example:

Error in read_excel("data.xlsx") : cannot find 'data.xlsx'

This tells you the file doesn’t exist in your working directory. Check your filename and file path.

Another example:

Error in mean(data$Value) : object 'Value' not found

This means the column name is not “Value”. Run colnames(data) to see actual column names and match them exactly (including capitalization and spaces).

Where to Find Help

Stack Overflow. Google your error message and add “R” or “ggplot2”. Stack Overflow usually has the answer.

R for Data Science. r4ds.had.co.nz is available free online and is the gold standard for learning data manipulation and visualization in R. The chapters on “Data Transformation” and “Data Visualization” cover 80% of what you need. If you prefer a physical copy to work from, the print edition is available on Amazon.

RStudio’s built-in Help. Type ?mean in the Console to see documentation for the mean function. Replace “mean” with any function name.

Ask a good question. If Stack Overflow doesn’t have your answer, post a minimal, reproducible example (a small sample of your data and the exact code that fails). This is much more likely to get a helpful response than a vague question.

Next Steps

This tutorial covers the basics: loading data, manipulating it, calculating statistics, and making publication-quality plots. This handles most wet lab analyses (qPCR comparisons, ELISA dose-response curves, flow cytometry population frequencies, microscopy quantification, etc.).

For more advanced workflows, explore these topics:

Mixed-effects models. If your data has repeated measures (the same animal measured multiple times) or nested structure (multiple cells from one animal), mixed-effects models account for this dependence. The “lme4” package in R handles this.

Survival analysis. If you’re analyzing time-to-event data (days until tumor recurrence, time to death), use survival curves with the “survival” package.

Multi-omics integration. If you’re combining transcriptomics, proteomics, and metabolomics, specialized packages like “mixOmics” exist.

For now, the 15 commands in this tutorial will take you very far. Master these, and you’ll handle 90% of wet lab data analyses elegantly and reproducibly. Once you’re comfortable, expand into more specialized techniques.