Psych 447: Week 2 Workbook Solutions

Joseph Bulbulia

Create Repos (if you have not already)

Create a repository for 447-journals (if you don’t have one already)
Create a repository for your 447-workbooks.
You should initialise a Readme statement and a .gitignore document (make sure to click the R .gitignore)

Solution

Students should confirm that they’ve done this

Create R projects

Create R-projects in each of these repositories.
In your workbooks folder, using the + New Folder command, create a folder called data (or similar) This is where you’ll store your data.
Create a file called figs (where you’ll stor your figures) 5 Create a folder called workbooks, or similar this is where you’ll store your weekly workbooks.

Solution

If students did not create a data folder, then the here package task won’t work.

Workbook 2

Q1 Open an R markdown document, and filling in your name

Q2. The first code chunk is for your preferences.

knitr::opts_chunk$set(echo = TRUE)

Q4. Memorise the keyboard shortcut for insert code chunk which is: ⌥⌘I

Q5. Use the mac to insert a code chunk.

Q6. Memorise the shortcut for evaluate which is ⌘-return^ use for all tasks in this workbook.

Q7. Use R to do 10 mathematical calculations

Solution

Students should confirm that they’ve done this, e.g.:

4 + 5

[1] 9

3 ^ 3

[1] 27

Show your work

Q8. Using R, find the square root of 324

Solution

Students should show their works*

JB: Students should confirm that they’ve done this

In the iris dataset, find the flower with the longest Sepal.Length. (hint, use the max function.) Show the column details for this flower only.

# answer:
max(iris$Sepal.Length)

[1] 7.9

In the iris dataset, find the flower with the shortest Sepal.Length. (hint, use the min function.)

# answer:
min(iris$Sepal.Length)

[1] 4.3

Calculate the difference bewteen the flower with the longest and shortest Sepal.Length

# answers
max(iris$Sepal.Length) - min(iris$Sepal.Length)

[1] 3.6

# or
a <- max(iris$Sepal.Length)
b <- min(iris$Sepal.Length)

Make a 7 column x 100 row dataframe using c, :, dataframe

# some quick fixes
df <- data.frame(
  col1 = sample(1000, 100, replace = TRUE),
  col2 = rnorm(n = 100, mean = 0, sd = 1),
  col3 = 1:100,
  col4 = rbinom(n = 100, size = 1, prob = .5),
  col5 = rpois(n = 100, lambda = 10),
  col6 = sample(c("a", "b", "c", "d"), 100, replace = TRUE),
  col7 =  rep(1:10, 10)
)

Rename the columns of the dataframe using the names of snow white dwarfs, or any names.

# using base R
names(df)[] <-
  c(
    "sleepy",
    "grumpy",
    "happy",
    "smelly",
    "lovely",
    "daggy",
    "crazy"
    )

Using the dataset woman, write a linear model for height ~ weight

# basic linear model

m1 <-  lm ( height ~ weight, data = women)

Create a table and coefficient plot for this model

# coefficient plot
sjPlot::plot_model(m1)

# table
sjPlot::tab_model(m1)

	height
Predictors	Estimates	CI	p
(Intercept)	25.72	23.47 – 27.98	<0.001
weight	0.29	0.27 – 0.30	<0.001
Observations	15
R² / R² adjusted	0.991 / 0.990

Using ggeffects, create a prediction plot for this model.

plot(
  ggeffects::ggpredict( m1 , effects = "height"),
  add.data = TRUE
)

$weight

Next week we’ll be creating plots like this

library(ggplot2)
ggplot2::ggplot(data = women) +
  geom_point(mapping = aes(x = height, y = weight)) +
  geom_smooth(method = loess, aes(x = height, y = weight)) +
  theme_classic()

Explain what is being calculated here

# this is a proportion (i.e. the number of instances / the total number of cases)
sum(women$weight > 140) / length(women$weight)

[1] 0.4

Calculate the proportion of women who are over the mean weight of women

# This is onle approach
# Find the mean woman weight
mw <- mean( women$weight )

# calculae the proportion of women who are over that weight 
sum(women$weight > mw) / length(women$weight)

[1] 0.4666667

What are the advantages and disadvantages of creating more breaks in the Petal.Length indicator? Clarify your answer in a paragraph.

# advantates: major pattern of data evident; clearly two clusters (at least); no tempatation to over interpret patters in the graph

# disadvantages, looks as if there is a cluster at zero; only two modes in the dataset are discernable
hist( iris$Petal.Length )

# advantages: we can find more than one mode;  distribution not clustered at zero
# disadvantages: with such a small sample, it is tempting to read too much into all the modes. 

hist( iris$Petal.Length, 
      breaks = 100 )

Spot the error in this code

# here is one method. 
mh <- mean (women$height)
sum(women$weight > mh) / length(women$height)


# should be `sum(women$height > mh)

Reorder the columns of the woman dataset so that weight comes before height. Then rename the columns “w” and “h”.

# here is one method
# Bind columns as data frame
dfa <- cbind.data.frame(women$weight, women$height)
# change names
names(dfa)[] <- c("w", "h")

Read data into R using the following method:

# read data from file
library("readr")
testdata<- readr::read_csv(url("https://raw.githubusercontent.com/go-bayes/psych-447/main/data/testdata1.csv"))
str(testdata)

spec_tbl_df [100 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ id    : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ weight: num [1:100] 80.5 87.8 95.7 74.6 68.3 ...
 $ height: num [1:100] 153 182 188 142 137 ...
 - attr(*, "spec")=
  .. cols(
  ..   id = col_double(),
  ..   weight = col_double(),
  ..   height = col_double()
  .. )

Save data into your data folder using the following method

library("here")
# save data using here
saveRDS(testdata, here::here("data", "td.RDS"))

Read data back into R

td <- readRDS(here::here("data", "td.RDS"))

Using the td dataset, write a linear model for height ~ weight as above

td <- readRDS(here::here("data", "td.RDS"))
# note a clever student will change the object to something other than `m1` which we used above
m2<-lm(height ~ weight, data = td)

sjPlot::tab_model(m2)

	height
Predictors	Estimates	CI	p
(Intercept)	12.74	5.51 – 19.98	0.001
weight	1.87	1.78 – 1.96	<0.001
Observations	100
R² / R² adjusted	0.948 / 0.947

Create a coefficient plot

sjPlot::plot_model(m2)

Create a prediction plot

plot(
  ggeffects::ggpredict( m2 , effects = "height" ),
  add.data = TRUE
)

$weight

Extra credit, how would you interpret the intercept in this model?*

# a zero intercept makes no sense because biologically speaking weight can't be zero
library( ggplot2 )
summary( m2 )


Call:
lm(formula = height ~ weight, data = td)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.4761  -6.2215  -0.1467   4.8392  23.9751 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.74337    3.64557   3.496 0.000712 ***
weight       1.87029    0.04432  42.201  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.683 on 98 degrees of freedom
Multiple R-squared:  0.9478,    Adjusted R-squared:  0.9473 
F-statistic:  1781 on 1 and 98 DF,  p-value: < 2.2e-16

```

Week 2 Workbook Solutions

Create Repos (if you have not already)

Solution

Create R projects

Solution

Workbook 2

Q1 Open an R markdown document, and filling in your name

Q2. The first code chunk is for your preferences.

Q3. Find the keyboard shortcuts menu.

Q4. Memorise the keyboard shortcut for insert code chunk which is: ⌥⌘I

Q5. Use the mac to insert a code chunk.

Q6. Memorise the shortcut for evaluate which is ⌘-return^ use for all tasks in this workbook.

Q7. Use R to do 10 mathematical calculations

Solution

Q8. Using R, find the square root of 324

Solution

Corrections

Reuse

Citation