Lab 2: Install R and Set Up Your IDE
Today's workflow
Complete this lab in a local RStudio project first. You can do all core exercises without git/GitHub, then connect to git/GitHub at the end.
This session introduces R and RStudio, then builds your core R skills.
Why learn R?
- You will need it for your final report (if you choose the report option).
- It supports your psychology coursework.
- It enhances your coding skills, which will help you in many domains of work, including utilising AI (!).
Installing R
- Visit CRAN at https://cran.r-project.org/.
- Select the version for your operating system (Windows, Mac, or Linux).
- Download and install by following the on-screen instructions.
Installing RStudio
Step 1: Install RStudio
- Go to https://posit.co/download/rstudio-desktop/.
- Choose the free version of RStudio Desktop and download it for your operating system.
- Install RStudio Desktop.
- Open RStudio to begin setting up your project environment.
Step 2: Choose your working folder and create lab folders
Use any folder you like for this lab. If you already have
labs-YOUR-USERNAME from Lab 1, you can use that. If not, create a new
folder:
mkdir -p ~/Documents/psy434/lab-02
cd ~/Documents/psy434/lab-02
mkdir -p diaries data R
pwd
ls -a
If you chose a different location, use that path instead.
Step 3: Open your folder as an RStudio project
- In RStudio, go to
File > New Project. - Choose Existing Directory.
- Browse to your folder (e.g.,
~/Documents/psy434/lab-02). - Click Create Project.
RStudio creates a .Rproj file in your folder.
File naming
Use clear labels that anyone could understand. That "anyone" will be your future self. Prefer lowercase with hyphens:
lab-02-intro.R, notLab 2 Intro.R.
Step 4: Create your first R script
Now that RStudio is installed, download the starter script:
- Download the R script for this lab (right-click → Save As).
- Save it as
R/lab-02.Rinside your project folder. - Open
R/lab-02.Rin RStudio.
If downloading is inconvenient, create your own script via
File > New File > R Script and save it as R/lab-02.R.
Step 5: Working with R scripts
- Write your R code in the script editor. Execute code by selecting
lines and pressing
Ctrl + Enter(Windows/Linux) orCmd + Enter(Mac). - Use comments (preceded by
#) to document your code. - Save your scripts regularly (
Ctrl + SorCmd + S).
Step 6: When you exit RStudio
Before concluding your work, restart R (Session > Restart R) and
choose not to save the workspace image when prompted.
Workflow habits
- Use clearly defined script names.
- Annotate your code.
- Save your scripts often (
Ctrl + SorCmd + S).
Running R from the terminal
You can run R without opening RStudio.
Interactive console
Open a terminal and type:
R
You will see the R prompt (>). Try a quick calculation:
1 + 1
Type q() to quit. When asked to save the workspace, type n.
Running a script
If you have an R script saved in your project folder, run it directly:
Rscript R/lab-02.R
Output prints to the terminal. This is useful for running code without opening RStudio, and is how R scripts are run on servers and in automated pipelines.
RStudio vs terminal
RStudio is easier for interactive exploration (viewing plots, inspecting data). The terminal is useful for running finished scripts and for working on remote machines. Both use the same R installation. Use whichever suits the task.
Basic R Commands
How to copy code from this page
- Open
File > New File > R Scriptin RStudio.- Name and save your new R script.
- Copy the code blocks below into your script.
- Save:
Ctrl + SorCmd + S.
Assignment (<-)
Assignment in R uses the <- operator:
x <- 10 # assigns the value 10 to x
y <- 5 # assigns the value 5 to y
RStudio shortcut for <-
- macOS:
Option+-(minus key)- Windows/Linux:
Alt+-(minus key)
Concatenation (c())
The c() function combines multiple elements into a vector:
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
Operations (+, -)
x <- 10
y <- 5
total <- x + y
print(total)
difference <- x - y
difference
Executing code
Ctrl + Enter(Windows/Linux) orCmd + Enter(Mac).
Multiplication (*) and Division (/)
product <- x * y
product
quotient <- x / y
quotient
# element-wise operations on vectors
vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
vector_product <- vector1 * vector2
vector_product
vector_division <- vector1 / vector2
vector_division
Be mindful of division by zero: 10 / 0 returns Inf, and 0 / 0
returns NaN.
# integer division and modulo
integer_division <- 10 %/% 3 # 3
remainder <- 10 %% 3 # 1
rm() Remove Object
devil_number <- 666
devil_number
rm(devil_number)
Logic (!, !=, ==)
x_not_y <- x != y # TRUE
x_equal_10 <- x == 10 # TRUE
OR (| and ||)
# element-wise OR
vector_or <- c(TRUE, FALSE) | c(FALSE, TRUE) # c(TRUE, TRUE)
# single OR (first element only)
single_or <- TRUE || FALSE # TRUE
AND (& and &&)
# element-wise AND
vector_and <- c(TRUE, FALSE) & c(FALSE, TRUE) # c(FALSE, FALSE)
# single AND (first element only)
single_and <- TRUE && FALSE # FALSE
RStudio workflow shortcuts
- Execute code line:
Cmd + Return(Mac) orCtrl + Enter(Win/Linux)- Insert section heading:
Cmd + Shift + R(Mac) orCtrl + Shift + R- Align code:
Cmd + Shift + A(Mac) orCtrl + Shift + A- Comment/uncomment:
Cmd/Ctrl + Shift + C- Save all:
Cmd/Ctrl + Shift + S- Find/replace:
Cmd/Ctrl + F- New file:
Cmd/Ctrl + Shift + N- Auto-complete:
TabFor more, explore
Tools > Command PaletteorShift + Cmd/Ctrl + P.
Data Types in R
Integers
x <- 42L
str(x) # int 42
y <- as.numeric(x)
str(y) # num 42
Integers are useful for counts or indices that do not require fractional values.
Characters
name <- "Alice"
Characters represent text: names, labels, descriptions.
Factors
colours <- factor(c("red", "blue", "green"))
Factors represent categorical data with a limited set of levels.
Ordered factors
education_levels <- c("high school", "bachelor", "master", "ph.d.")
# unordered
education_factor_no_order <- factor(education_levels, ordered = FALSE)
# ordered
education_factor <- factor(education_levels, ordered = TRUE)
education_factor
Ordered factors support logical comparisons based on level order:
edu1 <- ordered("bachelor", levels = education_levels)
edu2 <- ordered("master", levels = education_levels)
edu2 > edu1 # TRUE
Strings
you <- "world!"
greeting <- paste("hello,", you)
greeting # "hello, world!"
Vectors
Vectors are homogeneous: all elements must be of the same type.
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE, FALSE)
Manipulating vectors
vector_sum <- numeric_vector + 10
vector_multiplication <- numeric_vector * 2
vector_greater_than_three <- numeric_vector > 3
Accessing elements:
first_element <- numeric_vector[1]
some_elements <- numeric_vector[c(2, 4)]
Functions with vectors
mean(numeric_vector)
sum(numeric_vector)
sort(numeric_vector)
unique(character_vector)
Data Frames
Creating data frames
df <- data.frame(
name = c("alice", "bob", "charlie"),
age = c(25, 30, 35),
gender = c("female", "male", "male")
)
head(df)
str(df)
Accessing elements
# by column name
name_column <- df$name
# by row and column
second_person <- df[2, ]
age_column <- df[, "age"]
# by condition
very_old_people <- subset(df, age > 25)
mean(very_old_people$age)
Exploring data frames
head(df) # first six rows
tail(df) # last six rows
str(df) # structure
summary(df) # summary statistics
Manipulating data frames
# adding columns
df$employed <- c(TRUE, TRUE, FALSE)
# adding rows
new_person <- data.frame(name = "diana", age = 28, gender = "female", employed = TRUE)
df <- rbind(df, new_person)
# modifying values
df[4, "age"] <- 26
# removing columns
df$employed <- NULL
# removing rows
df <- df[-4, ]
rbind() and cbind()
rbind() combines data frames by rows; cbind() combines by columns.
When using these functions, column names (for rbind) or row counts
(for cbind) must match. We will use dplyr for more flexible joining
in later weeks.
Summary statistics
set.seed(12345)
vector <- rnorm(n = 40, mean = 0, sd = 1)
mean(vector)
sd(vector)
min(vector)
max(vector)
table() for categorical data
set.seed(12345)
gender <- sample(c("male", "female"), size = 100, replace = TRUE, prob = c(0.5, 0.5))
education_level <- sample(c("high school", "bachelor", "master"), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2))
df_table <- data.frame(gender, education_level)
table(df_table)
table(df_table$gender, df_table$education_level) # cross-tabulation
First Data Visualisation with ggplot2
ggplot2 is based on the Grammar of Graphics: you build plots layer by
layer. Install it once if needed: install.packages("ggplot2").
library(ggplot2)
set.seed(12345)
student_data <- data.frame(
name = c("alice", "bob", "charlie", "diana", "ethan", "fiona", "george", "hannah"),
score = sample(80:100, 8, replace = TRUE),
stringsAsFactors = FALSE
)
student_data$passed <- ifelse(student_data$score >= 90, "passed", "failed")
student_data$passed <- factor(student_data$passed, levels = c("failed", "passed"))
student_data$study_hours <- sample(5:15, 8, replace = TRUE)
Bar plot
ggplot(student_data, aes(x = name, y = score, fill = passed)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
labs(title = "student scores", x = "student name", y = "score") +
theme_minimal()
Scatter plot
ggplot(student_data, aes(x = study_hours, y = score, color = passed)) +
geom_point(size = 4) +
scale_color_manual(values = c("failed" = "red", "passed" = "blue")) +
labs(title = "scores vs. study hours", x = "study hours", y = "score") +
theme_minimal()
Box plot
ggplot(student_data, aes(x = passed, y = score, fill = passed)) +
geom_boxplot() +
scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
labs(title = "score distribution by pass/fail status", x = "status", y = "score") +
theme_minimal()
Histogram
ggplot(student_data, aes(x = score, fill = passed)) +
geom_histogram(binwidth = 5, color = "black", alpha = 0.7) +
scale_fill_manual(values = c("failed" = "red", "passed" = "blue")) +
labs(title = "histogram of scores", x = "score", y = "count") +
theme_minimal()
Line plot (time series)
months <- factor(month.abb[1:8], levels = month.abb[1:8])
study_hours <- c(0, 3, 15, 30, 35, 120, 18, 15)
study_data <- data.frame(month = months, study_hours = study_hours)
ggplot(study_data, aes(x = month, y = study_hours, group = 1)) +
geom_line(linewidth = 1, color = "blue") +
geom_point(color = "red", size = 1) +
labs(title = "monthly study hours", x = "month", y = "study hours") +
theme_minimal()
Summary of today's lab
This session covered:
- Installing and setting up R and RStudio
- Basic arithmetic operations
- Data structures: vectors, factors, data frames
- Data visualisation with
ggplot2
End-of-lab git/GitHub check-in
After you finish the R tasks above, you can save this work to GitHub.
If you already completed Lab 1: Git and GitHub, run:
git status
git add R/lab-02.R
git commit -m "add lab 2 r script and setup"
git push
If git/GitHub is still new for you, stop here and return to this section after your Lab 1 workflow is working.
Where to get help
- Large language models: LLMs are effective coding tutors. Help from LLMs for coding does not constitute a breach of academic integrity in this course. For your final report, cite all sources including LLMs.
- Stack Overflow: outstanding community resource.
- Cross Validated: best place for statistics advice.
- Developer websites: Tidyverse.
- Your tutors and course coordinator.
Recommended reading
- Wickham, H., & Grolemund, G. (2016). R for Data Science. O'Reilly Media. Available online
- Megan Hall's lecture: https://meghan.rbind.io/talk/neair/
- RStudio learning materials: https://posit.cloud/learn/primers
Appendix A: At-home exercises
Exercise 1: Install the tidyverse package
- Open RStudio.
- Go to
Tools > Install Packages....- Type
tidyverseand checkInstall dependencies.- Click
Install.- Load with
library(tidyverse).
Exercise 2: Install the parameters and report packages
- Go to
Tools > Install Packages....- Type
parameters, report.- Check
Install dependenciesand clickInstall.
Exercise 2b: Install the causalworkshop package
The
causalworkshoppackage provides simulated data for your research report. Run the following in your R console:install.packages("remotes") remotes::install_github("go-bayes/causalworkshop@v0.2.1")Verify the installation:
library(causalworkshop) d <- simulate_nzavs_data(n = 100, seed = 2026) head(d)
Exercise 3: Basic operations
- Create
vector_a <- c(2, 4, 6, 8)andvector_b <- c(1, 3, 5, 7).- Add them, subtract them, multiply
vector_aby 2, dividevector_bby 2.- Calculate the mean and standard deviation of both.
Exercise 4: Working with data frames
- Create a data frame with columns
id(1–4),name,score(88, 92, 85, 95).- Add a
passedcolumn (pass mark = 90).- Extract name and score of students who passed.
- Explore with
summary(),head(),str().
Exercise 5: Logical operations and subsetting
- Subset
student_datato find students who scored above the mean.- Create an
attendancevector and add it as a column.- Subset to select only rows where students were present.
Exercise 6: Cross-tabulation
- Create factor variables
fruitandcolour.- Make a data frame and use
table()for cross-tabulation.- Which fruit has the most colour variety?
Exercise 7: Visualisation with ggplot2
- Using
student_data, create a bar plot of scores by name.- Add a title, axis labels, and colour by pass/fail status.
Appendix B: Solutions
Solution 3: Basic operations
vector_a <- c(2, 4, 6, 8)
vector_b <- c(1, 3, 5, 7)
sum_vector <- vector_a + vector_b
diff_vector <- vector_a - vector_b
double_vector_a <- vector_a * 2
half_vector_b <- vector_b / 2
mean(vector_a); sd(vector_a)
mean(vector_b); sd(vector_b)
Solution 4: Working with data frames
student_data <- data.frame(
id = 1:4,
name = c("alice", "bob", "charlie", "diana"),
score = c(88, 92, 85, 95),
stringsAsFactors = FALSE
)
student_data$passed <- student_data$score >= 90
passed_students <- student_data[student_data$passed == TRUE, ]
summary(student_data)
Solution 5: Logical operations and subsetting
mean_score <- mean(student_data$score)
students_above_mean <- student_data[student_data$score > mean_score, ]
attendance <- c("present", "absent", "present", "present")
student_data$attendance <- attendance
present_students <- student_data[student_data$attendance == "present", ]
Solution 6: Cross-tabulation
fruit <- factor(c("apple", "banana", "apple", "orange", "banana"))
colour <- factor(c("red", "yellow", "green", "orange", "green"))
fruit_data <- data.frame(fruit, colour)
table(fruit_data$fruit, fruit_data$colour)
# apple has the most colour variety (red, green)
Solution 7: Visualisation
library(ggplot2)
ggplot(student_data, aes(x = name, y = score, fill = passed)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("TRUE" = "blue", "FALSE" = "red")) +
labs(title = "student scores", x = "name", y = "score") +
theme_minimal()