R Programming Plotting

01 - Welcome & Setup

Instructor and Agenda Introduction

Welcome to Introduction to Programming and Plotting with R! This course is designed specifically for researchers who want to harness the power of R for data analysis, visualization, and statistical computing. Whether you're working with experimental data, conducting statistical analyses, or creating publication-quality figures, R will become an indispensable tool in your research arsenal.

My name is Victor Gambarini. I have been working with R for more than ten years. I got my PhD from The University of Auckland, where I extensively used R for statistical analysis and data visualization in my bioinformatics research. One of my greatest research outputs is an online database of microorganisms that can biodegrade plastics called PlasticDB. While the web interface was built with Python, all the statistical analyses and data processing behind PlasticDB were done in R using packages like dplyr and ggplot2. We will be covering both of these essential packages in this course!

About This Course

Over the next three days, we'll build your R skills from the ground up, focusing on practical applications that researchers use daily. You'll learn not just the syntax, but how to think statistically and visually about research problems. Another feature of this course is that it includes a lot of practice. Practicing is very important in programming so we can build what they call muscle memory. We will build lots of muscle memory in this course!

Course Structure

  1. Welcome & Setup - Getting started with R and RStudio
  2. Basic Syntax & Variables - Foundation of R programming
  3. Data Structures - Vectors, lists, and data frames
  4. Control Flow - Making decisions and loops in code
  5. Functions - Writing reusable code
  6. File IO - Reading and writing data files
  7. DataFrames 101 - Introduction to data manipulation
  8. Data Manipulation - Cleaning and transforming datasets with dplyr
  9. Basic Plotting - Creating visualizations with base R
  10. Advanced Plotting - Professional figures with ggplot2
  11. Version Control - Git and GitHub for researchers
  12. Final Mini-Project - Apply your skills to a research problem
  13. Additional Resources - Where to go from here

What You'll Accomplish

By the end of this course, you'll be able to:

  • Write R scripts to automate statistical analyses
  • Import, clean, and manipulate datasets efficiently
  • Create publication-quality visualizations with ggplot2
  • Perform common statistical tests and interpretations
  • Apply the tidyverse workflow to research problems
  • Share your code and ensure reproducible research

Prerequisites

No prior programming experience is required! We'll start from the very beginning. All you need is:

  • A computer (Windows, Mac, or Linux)
  • Willingness to install R and RStudio
  • Curiosity and enthusiasm for data analysis

Why Use RStudio

What is RStudio?

RStudio is an Integrated Development Environment (IDE) specifically designed for R. Think of it as a powerful workspace that makes writing, running, and debugging R code much easier and more efficient. It's like having a Swiss Army knife for data analysis.

Why RStudio is Perfect for Researchers

1. Designed for Data Science

  • Built specifically for statistical computing and data analysis
  • Intuitive interface with four main panes for different tasks
  • Integrated help system and documentation viewer

2. Powerful Data Visualization

  • Built-in plot viewer with zoom and export capabilities
  • Seamless integration with ggplot2 and other visualization packages
  • Interactive plotting capabilities

3. Reproducible Research Features

  • R Markdown for combining code, text, and results
  • Easy creation of reports, presentations, and publications
  • Version control integration with Git

4. Efficient Coding Environment

  • Syntax highlighting and auto-completion
  • Code debugging tools and error highlighting
  • Integrated file browser and project management

5. Package Management Made Easy

  • Simple package installation and loading
  • Automatic dependency management
  • Access to CRAN repository with thousands of packages

RStudio vs. Other Options

Feature RStudio Base R GUI VS Code
R Integration Excellent Basic Good
Data Viewer Built-in None Limited
Plot Viewer Excellent Basic None
Package Management Easy Manual Manual
R Markdown Native None Extension
Learning Curve Gentle Steep Moderate

Installing R and RStudio

Step 1: Install R

R must be installed first, as RStudio requires it to function.

For Windows:

  1. Go to r-project.org
  2. Click "Download R"
  3. Choose any CRAN mirror (e.g., "0-Cloud")
  4. Click "Download R for Windows"
  5. Click "base"
  6. Download "Download R-4.x.x for Windows" (latest version)
  7. Run the installer with default settings

For Mac:

  1. Go to r-project.org
  2. Click "Download R"
  3. Choose any CRAN mirror
  4. Click "Download R for macOS"
  5. Download the appropriate .pkg file for your Mac (Intel or Apple Silicon)
  6. Run the installer

For Linux (Ubuntu/Debian):

bash sudo apt update sudo apt install r-base r-base-dev

Step 2: Install RStudio

  1. Go to rstudio.com
  2. Click "Download RStudio"
  3. Choose "RStudio Desktop" (free version)
  4. Download the installer for your operating system
  5. Run the installer with default settings

Step 3: Verify Installation

  1. Open RStudio (not R directly)
  2. You should see the RStudio interface with multiple panes
  3. In the Console pane, type: R.version.string and press Enter
  4. You should see output like: "R version 4.x.x (2024-xx-xx)"

RStudio Interface Overview

The Four Panes

When you first open RStudio, you'll see up to four main panes:

1. Source Pane (Top Left)

  • Purpose: Write and edit R scripts and R Markdown files
  • File Types: .R scripts, .Rmd files, data files
  • Features: Syntax highlighting, auto-completion, debugging

2. Console Pane (Bottom Left)

  • Purpose: Interactive R command line
  • Usage: Run code directly, see output and error messages
  • Prompt: The > symbol indicates R is ready for input

3. Environment/History Pane (Top Right)

  • Environment Tab: Shows all objects in your current workspace
  • History Tab: Shows previously executed commands
  • Connections Tab: Database connections (advanced)

4. Files/Plots/Packages/Help Pane (Bottom Right)

  • Files Tab: File browser for your computer
  • Plots Tab: Displays generated graphs and charts
  • Packages Tab: Manage installed R packages
  • Help Tab: Documentation and help files
  • Viewer Tab: For viewing web content and interactive plots

Customizing Your Layout

  • View > Panes > Pane Layout: Rearrange panes to your preference
  • Tools > Global Options > Appearance: Change themes and fonts
  • Zoom: Use Ctrl + and Ctrl - to adjust text size

Creating Your First R Script

Step 1: Create a New Script

  1. Click File > New File > R Script
  2. Alternatively, use the keyboard shortcut: Ctrl + Shift + N (Windows/Linux) or Cmd + Shift + N (Mac)
  3. A new tab will open in the Source pane

Step 2: Write Your First Code

Type the following in your new script:

```r

My First R Script

Author: [Your Name]

Date: [Today's Date]

Print a welcome message

print("Welcome to R!")

Perform a simple calculation

result <- 2 + 2 print(paste("2 + 2 =", result))

Create a simple vector

numbers <- c(1, 2, 3, 4, 5) print(paste("The sum of 1 to 5 is:", sum(numbers)))

Step 3: Save Your Script

  1. Click File > Save or use Ctrl + S (Windows/Linux) or Cmd + S (Mac)
  2. Choose a location and filename (e.g., "my_first_script.R")
  3. Make sure the file extension is ".R"

Step 4: Run Your Code

Option 1: Run Entire Script - Click the "Source" button in the Source pane - Or use Ctrl + Shift + Enter

Option 2: Run Selected Lines - Highlight the code you want to run - Click "Run" or use Ctrl + Enter

Option 3: Run Line by Line - Place cursor on a line - Press Ctrl + Enter to run that line


Working Directory and File Management

Understanding Working Directory

The working directory is the folder where R looks for files by default.

Check your current working directory: r getwd()

Change working directory: r setwd("/path/to/your/folder")

Best Practices for File Organization

Create a project folder structure: My_R_Project/ ├── data/ # Raw data files ├── scripts/ # R scripts ├── output/ # Results and figures ├── docs/ # Documentation └── README.txt # Project description

Loading Data Files

We'll download the famous penguins dataset for practice:

  1. Download the dataset: penguins.csv
  2. Save it in your project's data/ folder
  3. Load it in R:

```r

Read the penguins dataset

penguins <- read.csv("data/penguins.csv")

View the first few rows

head(penguins)

Get basic information about the dataset

str(penguins)


Installing and Loading Packages

What are R Packages?

R packages are collections of functions, data, and documentation that extend R's capabilities. Think of them as apps for your data analysis toolkit.

Installing Packages

From CRAN (most common): r install.packages("ggplot2") install.packages("dplyr")

Install multiple packages at once: r install.packages(c("ggplot2", "dplyr", "readr", "tidyr"))

Loading Packages

Load a package for use: r library(ggplot2) library(dplyr)

Check if a package is installed: r if (!require(ggplot2)) { install.packages("ggplot2") library(ggplot2) }

Essential Packages for This Course

Install these packages before our next session:

```r

Core tidyverse packages

install.packages("tidyverse")

Additional useful packages

install.packages(c("here", "palmerpenguins", "knitr"))

The tidyverse is a collection of packages including: - ggplot2: Data visualization - dplyr: Data manipulation - readr: Reading data files - tidyr: Data tidying - stringr: String manipulation - forcats: Working with factors


R Projects for Better Organization

What are R Projects?

R Projects help organize your work by: - Setting the working directory automatically - Keeping related files together - Making your work more portable and reproducible

Creating an R Project

  1. File > New Project
  2. Choose:
  3. New Directory: Start fresh
  4. Existing Directory: Use an existing folder
  5. Version Control: Clone from Git (advanced)
  6. Select "New Project"
  7. Choose a folder name (e.g., "R_Data_Analysis_Course")
  8. Select where to create it
  9. Click "Create Project"

Benefits of Using Projects

  • Automatic working directory: No need for setwd()
  • Isolated environments: Each project has its own workspace
  • Easy switching: Switch between projects easily
  • Version control ready: Integrates well with Git

Getting Help in R

Built-in Help System

Get help for a function: r help(mean) ?mean

Search for functions: r help.search("regression") ??regression

See examples: r example(plot)

Finding Help Online

Getting Help in RStudio

  • Help tab: Search documentation in the bottom-right pane
  • Auto-completion: Press Tab while typing function names
  • Function tooltips: Hover over functions to see help

Quick Reference

Essential RStudio Shortcuts

  • Ctrl + Enter: Run current line/selection
  • Ctrl + Shift + Enter: Run entire script
  • Ctrl + 1: Focus on Source pane
  • Ctrl + 2: Focus on Console pane
  • Ctrl + L: Clear console
  • Tab: Auto-complete function names
  • Ctrl + Shift + C: Comment/uncomment lines

Basic R Commands to Remember

```r

Assignment

x <- 5 y = 10 # Also works, but <- is preferred

Getting help

?function_name help(function_name)

Viewing data

head(data) # First 6 rows tail(data) # Last 6 rows str(data) # Structure summary(data) # Summary statistics

Working directory

getwd() # Get current directory setwd("path/here") # Set directory list.files() # List files in directory

What's Next?

In our next session, we'll dive into R basics - variables, data types, and fundamental operations. Make sure you have R and RStudio installed, can create projects and scripts, and have the essential packages installed before we continue.

Homework for next session: 1. Create an R Project called "R_Course_Practice" 2. Install the tidyverse package 3. Download and save the penguins.csv file in your project folder 4. Try running the code examples from this session

Enjoying this course?

This is just the first episode! Register to unlock 12 more episodes and complete your learning journey.

Register for Full Course