01 - Welcome & Setup
Instructor and Agenda Introduction
Welcome to Introduction to Programming and Plotting with R! This course is designed specifically for researchers who want to harness the power of R for data analysis, visualization, and statistical computing. Whether you're working with experimental data, conducting statistical analyses, or creating publication-quality figures, R will become an indispensable tool in your research arsenal.
My name is Victor Gambarini. I have been working with R for more than ten years. I got my PhD from The University of Auckland, where I extensively used R for statistical analysis and data visualization in my bioinformatics research. One of my greatest research outputs is an online database of microorganisms that can biodegrade plastics called PlasticDB. While the web interface was built with Python, all the statistical analyses and data processing behind PlasticDB were done in R using packages like dplyr and ggplot2. We will be covering both of these essential packages in this course!
About This Course
Over the next three days, we'll build your R skills from the ground up, focusing on practical applications that researchers use daily. You'll learn not just the syntax, but how to think statistically and visually about research problems. Another feature of this course is that it includes a lot of practice. Practicing is very important in programming so we can build what they call muscle memory. We will build lots of muscle memory in this course!
Course Structure
- Welcome & Setup - Getting started with R and RStudio
- Basic Syntax & Variables - Foundation of R programming
- Data Structures - Vectors, lists, and data frames
- Control Flow - Making decisions and loops in code
- Functions - Writing reusable code
- File IO - Reading and writing data files
- DataFrames 101 - Introduction to data manipulation
- Data Manipulation - Cleaning and transforming datasets with dplyr
- Basic Plotting - Creating visualizations with base R
- Advanced Plotting - Professional figures with ggplot2
- Version Control - Git and GitHub for researchers
- Final Mini-Project - Apply your skills to a research problem
- Additional Resources - Where to go from here
What You'll Accomplish
By the end of this course, you'll be able to:
- Write R scripts to automate statistical analyses
- Import, clean, and manipulate datasets efficiently
- Create publication-quality visualizations with ggplot2
- Perform common statistical tests and interpretations
- Apply the tidyverse workflow to research problems
- Share your code and ensure reproducible research
Prerequisites
No prior programming experience is required! We'll start from the very beginning. All you need is:
- A computer (Windows, Mac, or Linux)
- Willingness to install R and RStudio
- Curiosity and enthusiasm for data analysis
Why Use RStudio
What is RStudio?
RStudio is an Integrated Development Environment (IDE) specifically designed for R. Think of it as a powerful workspace that makes writing, running, and debugging R code much easier and more efficient. It's like having a Swiss Army knife for data analysis.
Why RStudio is Perfect for Researchers
1. Designed for Data Science
- Built specifically for statistical computing and data analysis
- Intuitive interface with four main panes for different tasks
- Integrated help system and documentation viewer
2. Powerful Data Visualization
- Built-in plot viewer with zoom and export capabilities
- Seamless integration with ggplot2 and other visualization packages
- Interactive plotting capabilities
3. Reproducible Research Features
- R Markdown for combining code, text, and results
- Easy creation of reports, presentations, and publications
- Version control integration with Git
4. Efficient Coding Environment
- Syntax highlighting and auto-completion
- Code debugging tools and error highlighting
- Integrated file browser and project management
5. Package Management Made Easy
- Simple package installation and loading
- Automatic dependency management
- Access to CRAN repository with thousands of packages
RStudio vs. Other Options
Feature | RStudio | Base R GUI | VS Code |
---|---|---|---|
R Integration | Excellent | Basic | Good |
Data Viewer | Built-in | None | Limited |
Plot Viewer | Excellent | Basic | None |
Package Management | Easy | Manual | Manual |
R Markdown | Native | None | Extension |
Learning Curve | Gentle | Steep | Moderate |
Installing R and RStudio
Step 1: Install R
R must be installed first, as RStudio requires it to function.
For Windows:
- Go to r-project.org
- Click "Download R"
- Choose any CRAN mirror (e.g., "0-Cloud")
- Click "Download R for Windows"
- Click "base"
- Download "Download R-4.x.x for Windows" (latest version)
- Run the installer with default settings
For Mac:
- Go to r-project.org
- Click "Download R"
- Choose any CRAN mirror
- Click "Download R for macOS"
- Download the appropriate .pkg file for your Mac (Intel or Apple Silicon)
- Run the installer
For Linux (Ubuntu/Debian):
bash
sudo apt update
sudo apt install r-base r-base-dev
Step 2: Install RStudio
- Go to rstudio.com
- Click "Download RStudio"
- Choose "RStudio Desktop" (free version)
- Download the installer for your operating system
- Run the installer with default settings
Step 3: Verify Installation
- Open RStudio (not R directly)
- You should see the RStudio interface with multiple panes
- In the Console pane, type:
R.version.string
and press Enter - You should see output like:
"R version 4.x.x (2024-xx-xx)"
RStudio Interface Overview
The Four Panes
When you first open RStudio, you'll see up to four main panes:
1. Source Pane (Top Left)
- Purpose: Write and edit R scripts and R Markdown files
- File Types: .R scripts, .Rmd files, data files
- Features: Syntax highlighting, auto-completion, debugging
2. Console Pane (Bottom Left)
- Purpose: Interactive R command line
- Usage: Run code directly, see output and error messages
- Prompt: The
>
symbol indicates R is ready for input
3. Environment/History Pane (Top Right)
- Environment Tab: Shows all objects in your current workspace
- History Tab: Shows previously executed commands
- Connections Tab: Database connections (advanced)
4. Files/Plots/Packages/Help Pane (Bottom Right)
- Files Tab: File browser for your computer
- Plots Tab: Displays generated graphs and charts
- Packages Tab: Manage installed R packages
- Help Tab: Documentation and help files
- Viewer Tab: For viewing web content and interactive plots
Customizing Your Layout
- View > Panes > Pane Layout: Rearrange panes to your preference
- Tools > Global Options > Appearance: Change themes and fonts
- Zoom: Use
Ctrl +
andCtrl -
to adjust text size
Creating Your First R Script
Step 1: Create a New Script
- Click File > New File > R Script
- Alternatively, use the keyboard shortcut:
Ctrl + Shift + N
(Windows/Linux) orCmd + Shift + N
(Mac) - A new tab will open in the Source pane
Step 2: Write Your First Code
Type the following in your new script:
```r
My First R Script
Author: [Your Name]
Date: [Today's Date]
Print a welcome message
print("Welcome to R!")
Perform a simple calculation
result <- 2 + 2 print(paste("2 + 2 =", result))
Create a simple vector
numbers <- c(1, 2, 3, 4, 5) print(paste("The sum of 1 to 5 is:", sum(numbers)))
Step 3: Save Your Script
- Click File > Save or use
Ctrl + S
(Windows/Linux) orCmd + S
(Mac) - Choose a location and filename (e.g., "my_first_script.R")
- Make sure the file extension is ".R"
Step 4: Run Your Code
Option 1: Run Entire Script
- Click the "Source" button in the Source pane
- Or use Ctrl + Shift + Enter
Option 2: Run Selected Lines
- Highlight the code you want to run
- Click "Run" or use Ctrl + Enter
Option 3: Run Line by Line
- Place cursor on a line
- Press Ctrl + Enter
to run that line
Working Directory and File Management
Understanding Working Directory
The working directory is the folder where R looks for files by default.
Check your current working directory:
r
getwd()
Change working directory:
r
setwd("/path/to/your/folder")
Best Practices for File Organization
Create a project folder structure:
My_R_Project/
├── data/ # Raw data files
├── scripts/ # R scripts
├── output/ # Results and figures
├── docs/ # Documentation
└── README.txt # Project description
Loading Data Files
We'll download the famous penguins dataset for practice:
- Download the dataset: penguins.csv
- Save it in your project's
data/
folder - Load it in R:
```r
Read the penguins dataset
penguins <- read.csv("data/penguins.csv")
View the first few rows
head(penguins)
Get basic information about the dataset
str(penguins)
Installing and Loading Packages
What are R Packages?
R packages are collections of functions, data, and documentation that extend R's capabilities. Think of them as apps for your data analysis toolkit.
Installing Packages
From CRAN (most common):
r
install.packages("ggplot2")
install.packages("dplyr")
Install multiple packages at once:
r
install.packages(c("ggplot2", "dplyr", "readr", "tidyr"))
Loading Packages
Load a package for use:
r
library(ggplot2)
library(dplyr)
Check if a package is installed:
r
if (!require(ggplot2)) {
install.packages("ggplot2")
library(ggplot2)
}
Essential Packages for This Course
Install these packages before our next session:
```r
Core tidyverse packages
install.packages("tidyverse")
Additional useful packages
install.packages(c("here", "palmerpenguins", "knitr"))
The tidyverse
is a collection of packages including:
- ggplot2: Data visualization
- dplyr: Data manipulation
- readr: Reading data files
- tidyr: Data tidying
- stringr: String manipulation
- forcats: Working with factors
R Projects for Better Organization
What are R Projects?
R Projects help organize your work by: - Setting the working directory automatically - Keeping related files together - Making your work more portable and reproducible
Creating an R Project
- File > New Project
- Choose:
- New Directory: Start fresh
- Existing Directory: Use an existing folder
- Version Control: Clone from Git (advanced)
- Select "New Project"
- Choose a folder name (e.g., "R_Data_Analysis_Course")
- Select where to create it
- Click "Create Project"
Benefits of Using Projects
- Automatic working directory: No need for
setwd()
- Isolated environments: Each project has its own workspace
- Easy switching: Switch between projects easily
- Version control ready: Integrates well with Git
Getting Help in R
Built-in Help System
Get help for a function:
r
help(mean)
?mean
Search for functions:
r
help.search("regression")
??regression
See examples:
r
example(plot)
Finding Help Online
- R Documentation: rdocumentation.org
- Stack Overflow: stackoverflow.com/questions/tagged/r
- R-bloggers: r-bloggers.com
- RStudio Community: community.rstudio.com
Getting Help in RStudio
- Help tab: Search documentation in the bottom-right pane
- Auto-completion: Press
Tab
while typing function names - Function tooltips: Hover over functions to see help
Quick Reference
Essential RStudio Shortcuts
Ctrl + Enter
: Run current line/selectionCtrl + Shift + Enter
: Run entire scriptCtrl + 1
: Focus on Source paneCtrl + 2
: Focus on Console paneCtrl + L
: Clear consoleTab
: Auto-complete function namesCtrl + Shift + C
: Comment/uncomment lines
Basic R Commands to Remember
```r
Assignment
x <- 5 y = 10 # Also works, but <- is preferred
Getting help
?function_name help(function_name)
Viewing data
head(data) # First 6 rows tail(data) # Last 6 rows str(data) # Structure summary(data) # Summary statistics
Working directory
getwd() # Get current directory setwd("path/here") # Set directory list.files() # List files in directory
What's Next?
In our next session, we'll dive into R basics - variables, data types, and fundamental operations. Make sure you have R and RStudio installed, can create projects and scripts, and have the essential packages installed before we continue.
Homework for next session: 1. Create an R Project called "R_Course_Practice" 2. Install the tidyverse package 3. Download and save the penguins.csv file in your project folder 4. Try running the code examples from this session
Enjoying this course?
This is just the first episode! Register to unlock 12 more episodes and complete your learning journey.
Register for Full Course