Lab 2: Install R and Set Up Your IDE

Today's workflow

Complete this lab in a local RStudio project first. You can do all core exercises without git/GitHub, then connect to git/GitHub at the end.

This session introduces R and RStudio, then builds your core R skills.

Why learn R?

  • You will need it for your final report (if you choose the report option).
  • It supports your psychology coursework.
  • It enhances your coding skills, which will help you in many domains of work, including utilising AI (!).

Installing R

  1. Visit CRAN at https://cran.r-project.org/.
  2. Select the version for your operating system (Windows, Mac, or Linux).
  3. Download and install by following the on-screen instructions.

Installing RStudio

Step 1: Install RStudio

  1. Go to https://posit.co/download/rstudio-desktop/.
  2. Choose the free version of RStudio Desktop and download it for your operating system.
  3. Install RStudio Desktop.
  4. Open RStudio to begin setting up your project environment.

Step 2: Choose your working folder and create lab folders

Use any folder you like for this lab. If you already have labs-YOUR-USERNAME from Lab 1, you can use that. If not, create a new folder:

mkdir -p ~/Documents/psy434/lab-02
cd ~/Documents/psy434/lab-02
mkdir -p diaries data R
pwd
ls -a

If you chose a different location, use that path instead.

Step 3: Open your folder as an RStudio project

  1. In RStudio, go to File > New Project.
  2. Choose Existing Directory.
  3. Browse to your folder (e.g., ~/Documents/psy434/lab-02).
  4. Click Create Project.

RStudio creates a .Rproj file in your folder.

File naming

Use clear labels that anyone could understand. That "anyone" will be your future self. Prefer lowercase with hyphens: lab-02-intro.R, not Lab 2 Intro.R.

Step 4: Create your first R script

Now that RStudio is installed, download the starter script:

  1. Download the R script for this lab (right-click → Save As).
  2. Save it as R/lab-02.R inside your project folder.
  3. Open R/lab-02.R in RStudio.

If downloading is inconvenient, create your own script via File > New File > R Script and save it as R/lab-02.R.

Step 5: Working with R scripts

  1. Write your R code in the script editor. Execute code by selecting lines and pressing Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac).
  2. Use comments (preceded by #) to document your code.
  3. Save your scripts regularly (Ctrl + S or Cmd + S).

Step 6: When you exit RStudio

Before concluding your work, restart R (Session > Restart R) and choose not to save the workspace image when prompted.

Workflow habits

  • Use clearly defined script names.
  • Annotate your code.
  • Save your scripts often (Ctrl + S or Cmd + S).

Running R from the terminal

You can run R without opening RStudio.

Interactive console

Open a terminal and type:

R

You will see the R prompt (>). Try a quick calculation:

1 + 1

Type q() to quit. When asked to save the workspace, type n.

Running a script

If you have an R script saved in your project folder, run it directly:

Rscript R/lab-02.R

Output prints to the terminal. This is useful for running code without opening RStudio, and is how R scripts are run on servers and in automated pipelines.

RStudio vs terminal

RStudio is easier for interactive exploration (viewing plots, inspecting data). The terminal is useful for running finished scripts and for working on remote machines. Both use the same R installation. Use whichever suits the task.


Basic R Commands

How to copy code from this page

  • Open File > New File > R Script in RStudio.
  • Name and save your new R script.
  • Copy the code blocks below into your script.
  • Save: Ctrl + S or Cmd + S.

Assignment (<-)

Assignment in R uses the <- operator:

x <- 10 # assigns the value 10 to x
y <- 5  # assigns the value 5 to y

RStudio shortcut for <-

  • macOS: Option + - (minus key)
  • Windows/Linux: Alt + - (minus key)

Concatenation (c())

The c() function combines multiple elements into a vector:

numbers <- c(1, 2, 3, 4, 5)
print(numbers)

Operations (+, -)

x <- 10
y <- 5

total <- x + y
print(total)

difference <- x - y
difference

Executing code

Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac).

Multiplication (*) and Division (/)

product <- x * y
product

quotient <- x / y
quotient

# element-wise operations on vectors
vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
vector_product <- vector1 * vector2
vector_product

vector_division <- vector1 / vector2
vector_division

Be mindful of division by zero: 10 / 0 returns Inf, and 0 / 0 returns NaN.

# integer division and modulo
integer_division <- 10 %/% 3  # 3
remainder <- 10 %% 3          # 1

rm() Remove Object

devil_number <- 666
devil_number
rm(devil_number)

Logic (!, !=, ==)

x_not_y <- x != y     # TRUE
x_equal_10 <- x == 10 # TRUE

OR (| and ||)

# element-wise OR
vector_or <- c(TRUE, FALSE) | c(FALSE, TRUE) # c(TRUE, TRUE)

# single OR (first element only)
single_or <- TRUE || FALSE # TRUE

AND (& and &&)

# element-wise AND
vector_and <- c(TRUE, FALSE) & c(FALSE, TRUE) # c(FALSE, FALSE)

# single AND (first element only)
single_and <- TRUE && FALSE # FALSE

RStudio workflow shortcuts

  • Execute code line: Cmd + Return (Mac) or Ctrl + Enter (Win/Linux)
  • Insert section heading: Cmd + Shift + R (Mac) or Ctrl + Shift + R
  • Align code: Cmd + Shift + A (Mac) or Ctrl + Shift + A
  • Comment/uncomment: Cmd/Ctrl + Shift + C
  • Save all: Cmd/Ctrl + Shift + S
  • Find/replace: Cmd/Ctrl + F
  • New file: Cmd/Ctrl + Shift + N
  • Auto-complete: Tab

For more, explore Tools > Command Palette or Shift + Cmd/Ctrl + P.


Data Types in R

Integers

x <- 42L
str(x)         # int 42

y <- as.numeric(x)
str(y)         # num 42

Integers are useful for counts or indices that do not require fractional values.

Characters

name <- "Alice"

Characters represent text: names, labels, descriptions.

Factors

colours <- factor(c("red", "blue", "green"))

Factors represent categorical data with a limited set of levels.

Ordered factors

education_levels <- c("high school", "bachelor", "master", "ph.d.")

# unordered
education_factor_no_order <- factor(education_levels, ordered = FALSE)

# ordered
education_factor <- factor(education_levels, ordered = TRUE)
education_factor

Ordered factors support logical comparisons based on level order:

edu1 <- ordered("bachelor", levels = education_levels)
edu2 <- ordered("master", levels = education_levels)
edu2 > edu1  # TRUE

Strings

you <- "world!"
greeting <- paste("hello,", you)
greeting  # "hello, world!"

Vectors

Vectors are homogeneous: all elements must be of the same type.

numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE, FALSE)

Manipulating vectors

vector_sum <- numeric_vector + 10
vector_multiplication <- numeric_vector * 2
vector_greater_than_three <- numeric_vector > 3

Accessing elements:

first_element <- numeric_vector[1]
some_elements <- numeric_vector[c(2, 4)]

Functions with vectors

mean(numeric_vector)
sum(numeric_vector)
sort(numeric_vector)
unique(character_vector)

Data Frames

Creating data frames

df <- data.frame(
  name = c("alice", "bob", "charlie"),
  age = c(25, 30, 35),
  gender = c("female", "male", "male")
)
head(df)
str(df)

Accessing elements

# by column name
name_column <- df$name

# by row and column
second_person <- df[2, ]
age_column <- df[, "age"]

# by condition
very_old_people <- subset(df, age > 25)
mean(very_old_people$age)

Exploring data frames

head(df)    # first six rows
tail(df)    # last six rows
str(df)     # structure
summary(df) # summary statistics

Manipulating data frames

# adding columns
df$employed <- c(TRUE, TRUE, FALSE)

# adding rows
new_person <- data.frame(name = "diana", age = 28, gender = "female", employed = TRUE)
df <- rbind(df, new_person)

# modifying values
df[4, "age"] <- 26

# removing columns
df$employed <- NULL

# removing rows
df <- df[-4, ]

rbind() and cbind()

rbind() combines data frames by rows; cbind() combines by columns. When using these functions, column names (for rbind) or row counts (for cbind) must match. We will use dplyr for more flexible joining in later weeks.


Summary statistics

set.seed(12345)
vector <- rnorm(n = 40, mean = 0, sd = 1)
mean(vector)
sd(vector)
min(vector)
max(vector)

table() for categorical data

set.seed(12345)
gender <- sample(c("male", "female"), size = 100, replace = TRUE, prob = c(0.5, 0.5))
education_level <- sample(c("high school", "bachelor", "master"), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2))
df_table <- data.frame(gender, education_level)
table(df_table)
table(df_table$gender, df_table$education_level)  # cross-tabulation

First Data Visualisation with ggplot2

ggplot2 is based on the Grammar of Graphics: you build plots layer by layer. Install it once if needed: install.packages("ggplot2").

library(ggplot2)

set.seed(12345)
student_data <- data.frame(
  name = c("alice", "bob", "charlie", "diana", "ethan", "fiona", "george", "hannah"),
  score = sample(80:100, 8, replace = TRUE),
  stringsAsFactors = FALSE
)
student_data$passed <- ifelse(student_data$score >= 90, "passed", "failed")
student_data$passed <- factor(student_data$passed, levels = c("failed", "passed"))
student_data$study_hours <- sample(5:15, 8, replace = TRUE)

Bar plot

ggplot(student_data, aes(x = name, y = score, fill = passed)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
  labs(title = "student scores", x = "student name", y = "score") +
  theme_minimal()

Scatter plot

ggplot(student_data, aes(x = study_hours, y = score, color = passed)) +
  geom_point(size = 4) +
  scale_color_manual(values = c("failed" = "red", "passed" = "blue")) +
  labs(title = "scores vs. study hours", x = "study hours", y = "score") +
  theme_minimal()

Box plot

ggplot(student_data, aes(x = passed, y = score, fill = passed)) +
  geom_boxplot() +
  scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
  labs(title = "score distribution by pass/fail status", x = "status", y = "score") +
  theme_minimal()

Histogram

ggplot(student_data, aes(x = score, fill = passed)) +
  geom_histogram(binwidth = 5, color = "black", alpha = 0.7) +
  scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
  labs(title = "histogram of scores", x = "score", y = "count") +
  theme_minimal()

Line plot (time series)

months <- factor(month.abb[1:8], levels = month.abb[1:8])
study_hours <- c(0, 3, 15, 30, 35, 120, 18, 15)
study_data <- data.frame(month = months, study_hours = study_hours)

ggplot(study_data, aes(x = month, y = study_hours, group = 1)) +
  geom_line(linewidth = 1, color = "blue") +
  geom_point(color = "red", size = 1) +
  labs(title = "monthly study hours", x = "month", y = "study hours") +
  theme_minimal()

Summary of today's lab

This session covered:

  • Installing and setting up R and RStudio
  • Basic arithmetic operations
  • Data structures: vectors, factors, data frames
  • Data visualisation with ggplot2

End-of-lab git/GitHub check-in

After you finish the R tasks above, you can save this work to GitHub.

If you already completed Lab 1: Git and GitHub, run:

git status
git add R/lab-02.R
git commit -m "add lab 2 r script and setup"
git push

If git/GitHub is still new for you, stop here and return to this section after your Lab 1 workflow is working.

Where to get help

  1. Large language models: LLMs are effective coding tutors. Help from LLMs for coding does not constitute a breach of academic integrity in this course. For your final report, cite all sources including LLMs.
  2. Stack Overflow: outstanding community resource.
  3. Cross Validated: best place for statistics advice.
  4. Developer websites: Tidyverse.
  5. Your tutors and course coordinator.

Appendix A: At-home exercises

Exercise 1: Install the tidyverse package

  1. Open RStudio.
  2. Go to Tools > Install Packages....
  3. Type tidyverse and check Install dependencies.
  4. Click Install.
  5. Load with library(tidyverse).

Exercise 2: Install the parameters and report packages

  1. Go to Tools > Install Packages....
  2. Type parameters, report.
  3. Check Install dependencies and click Install.

Exercise 2b: Install the causalworkshop package

The causalworkshop package provides simulated data for your research report. Run the following in your R console:

install.packages("remotes")
remotes::install_github("go-bayes/causalworkshop@v0.2.1")

Verify the installation:

library(causalworkshop)
d <- simulate_nzavs_data(n = 100, seed = 2026)
head(d)

Exercise 3: Basic operations

  1. Create vector_a <- c(2, 4, 6, 8) and vector_b <- c(1, 3, 5, 7).
  2. Add them, subtract them, multiply vector_a by 2, divide vector_b by 2.
  3. Calculate the mean and standard deviation of both.

Exercise 4: Working with data frames

  1. Create a data frame with columns id (1–4), name, score (88, 92, 85, 95).
  2. Add a passed column (pass mark = 90).
  3. Extract name and score of students who passed.
  4. Explore with summary(), head(), str().

Exercise 5: Logical operations and subsetting

  1. Subset student_data to find students who scored above the mean.
  2. Create an attendance vector and add it as a column.
  3. Subset to select only rows where students were present.

Exercise 6: Cross-tabulation

  1. Create factor variables fruit and colour.
  2. Make a data frame and use table() for cross-tabulation.
  3. Which fruit has the most colour variety?

Exercise 7: Visualisation with ggplot2

  1. Using student_data, create a bar plot of scores by name.
  2. Add a title, axis labels, and colour by pass/fail status.

Appendix B: Solutions

Solution 3: Basic operations

vector_a <- c(2, 4, 6, 8)
vector_b <- c(1, 3, 5, 7)

sum_vector <- vector_a + vector_b
diff_vector <- vector_a - vector_b
double_vector_a <- vector_a * 2
half_vector_b <- vector_b / 2

mean(vector_a); sd(vector_a)
mean(vector_b); sd(vector_b)

Solution 4: Working with data frames

student_data <- data.frame(
  id = 1:4,
  name = c("alice", "bob", "charlie", "diana"),
  score = c(88, 92, 85, 95),
  stringsAsFactors = FALSE
)
student_data$passed <- student_data$score >= 90
passed_students <- student_data[student_data$passed == TRUE, ]
summary(student_data)

Solution 5: Logical operations and subsetting

mean_score <- mean(student_data$score)
students_above_mean <- student_data[student_data$score > mean_score, ]

attendance <- c("present", "absent", "present", "present")
student_data$attendance <- attendance
present_students <- student_data[student_data$attendance == "present", ]

Solution 6: Cross-tabulation

fruit <- factor(c("apple", "banana", "apple", "orange", "banana"))
colour <- factor(c("red", "yellow", "green", "orange", "green"))
fruit_data <- data.frame(fruit, colour)
table(fruit_data$fruit, fruit_data$colour)
# apple has the most colour variety (red, green)

Solution 7: Visualisation

library(ggplot2)
ggplot(student_data, aes(x = name, y = score, fill = passed)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("TRUE" = "blue", "FALSE" = "red")) +
  labs(title = "student scores", x = "name", y = "score") +
  theme_minimal()