Week 2 Workbook Solutions

Joseph Bulbulia https://josephbulbulia.netlify.app (Victoria University of Wellington)https://www.wgtn.ac.nz
03-17-2021

Create Repos (if you have not already)

  1. Create a repository for 447-journals (if you don’t have one already)
  2. Create a repository for your 447-workbooks.
  3. You should initialise a Readme statement and a .gitignore document (make sure to click the R .gitignore)

Solution

Students should confirm that they’ve done this

Create R projects

  1. Create R-projects in each of these repositories.
  2. In your workbooks folder, using the + New Folder command, create a folder called data (or similar) This is where you’ll store your data.
  3. Create a file called figs (where you’ll stor your figures) 5 Create a folder called workbooks, or similar this is where you’ll store your weekly workbooks.

Solution

If students did not create a data folder, then the here package task won’t work.

Workbook 2

Q1 Open an R markdown document, and filling in your name

Q2. The first code chunk is for your preferences.

knitr::opts_chunk$set(echo = TRUE)

Q3. Find the keyboard shortcuts menu.

Q4. Memorise the keyboard shortcut for insert code chunk which is: ⌥⌘I

Q5. Use the mac to insert a code chunk.

Q6. Memorise the shortcut for evaluate which is ⌘-return^ use for all tasks in this workbook.

Q7. Use R to do 10 mathematical calculations

Solution

Students should confirm that they’ve done this, e.g.:

4 + 5
[1] 9
3 ^ 3
[1] 27

Show your work

Q8. Using R, find the square root of 324

Solution

Students should show their works*

JB: Students should confirm that they’ve done this

  1. In the iris dataset, find the flower with the longest Sepal.Length. (hint, use the max function.) Show the column details for this flower only.
# answer:
max(iris$Sepal.Length)
[1] 7.9
  1. In the iris dataset, find the flower with the shortest Sepal.Length. (hint, use the min function.)
# answer:
min(iris$Sepal.Length)
[1] 4.3
  1. Calculate the difference bewteen the flower with the longest and shortest Sepal.Length
# answers
max(iris$Sepal.Length) - min(iris$Sepal.Length)
[1] 3.6
# or
a <- max(iris$Sepal.Length)
b <- min(iris$Sepal.Length)
  1. Make a 7 column x 100 row dataframe using c, :, dataframe
# some quick fixes
df <- data.frame(
  col1 = sample(1000, 100, replace = TRUE),
  col2 = rnorm(n = 100, mean = 0, sd = 1),
  col3 = 1:100,
  col4 = rbinom(n = 100, size = 1, prob = .5),
  col5 = rpois(n = 100, lambda = 10),
  col6 = sample(c("a", "b", "c", "d"), 100, replace = TRUE),
  col7 =  rep(1:10, 10)
)
  1. Rename the columns of the dataframe using the names of snow white dwarfs, or any names.
# using base R
names(df)[] <-
  c(
    "sleepy",
    "grumpy",
    "happy",
    "smelly",
    "lovely",
    "daggy",
    "crazy"
    )
  1. Using the dataset woman, write a linear model for height ~ weight
# basic linear model

m1 <-  lm ( height ~ weight, data = women)
  1. Create a table and coefficient plot for this model
# coefficient plot
sjPlot::plot_model(m1)

# table
sjPlot::tab_model(m1)
  height
Predictors Estimates CI p
(Intercept) 25.72 23.47 – 27.98 <0.001
weight 0.29 0.27 – 0.30 <0.001
Observations 15
R2 / R2 adjusted 0.991 / 0.990
  1. Using ggeffects, create a prediction plot for this model.
plot(
  ggeffects::ggpredict( m1 , effects = "height"),
  add.data = TRUE
)
$weight

  1. Explain what is being calculated here
# this is a proportion (i.e. the number of instances / the total number of cases)
sum(women$weight > 140) / length(women$weight)
[1] 0.4
  1. Calculate the proportion of women who are over the mean weight of women
# This is onle approach
# Find the mean woman weight
mw <- mean( women$weight )

# calculae the proportion of women who are over that weight 
sum(women$weight > mw) / length(women$weight)
[1] 0.4666667
  1. What are the advantages and disadvantages of creating more breaks in the Petal.Length indicator? Clarify your answer in a paragraph.
# advantates: major pattern of data evident; clearly two clusters (at least); no tempatation to over interpret patters in the graph

# disadvantages, looks as if there is a cluster at zero; only two modes in the dataset are discernable
hist( iris$Petal.Length )

# advantages: we can find more than one mode;  distribution not clustered at zero
# disadvantages: with such a small sample, it is tempting to read too much into all the modes. 

hist( iris$Petal.Length, 
      breaks = 100 )

  1. Spot the error in this code
# here is one method. 
mh <- mean (women$height)
sum(women$weight > mh) / length(women$height)


# should be `sum(women$height > mh)
  1. Reorder the columns of the woman dataset so that weight comes before height. Then rename the columns “w” and “h”.
# here is one method
# Bind columns as data frame
dfa <- cbind.data.frame(women$weight, women$height)
# change names
names(dfa)[] <- c("w", "h")
  1. Read data into R using the following method:
# read data from file
library("readr")
testdata<- readr::read_csv(url("https://raw.githubusercontent.com/go-bayes/psych-447/main/data/testdata1.csv"))
str(testdata)
spec_tbl_df [100 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ id    : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ weight: num [1:100] 80.5 87.8 95.7 74.6 68.3 ...
 $ height: num [1:100] 153 182 188 142 137 ...
 - attr(*, "spec")=
  .. cols(
  ..   id = col_double(),
  ..   weight = col_double(),
  ..   height = col_double()
  .. )

Save data into your data folder using the following method

library("here")
# save data using here
saveRDS(testdata, here::here("data", "td.RDS"))

Read data back into R

td <- readRDS(here::here("data", "td.RDS"))
  1. Using the td dataset, write a linear model for height ~ weight as above
td <- readRDS(here::here("data", "td.RDS"))
# note a clever student will change the object to something other than `m1` which we used above
m2<-lm(height ~ weight, data = td)

sjPlot::tab_model(m2)
  height
Predictors Estimates CI p
(Intercept) 12.74 5.51 – 19.98 0.001
weight 1.87 1.78 – 1.96 <0.001
Observations 100
R2 / R2 adjusted 0.948 / 0.947
  1. Create a coefficient plot
sjPlot::plot_model(m2)

  1. Create a prediction plot
plot(
  ggeffects::ggpredict( m2 , effects = "height" ),
  add.data = TRUE
)
$weight

Extra credit, how would you interpret the intercept in this model?*

# a zero intercept makes no sense because biologically speaking weight can't be zero
library( ggplot2 )
summary( m2 )

Call:
lm(formula = height ~ weight, data = td)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.4761  -6.2215  -0.1467   4.8392  23.9751 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.74337    3.64557   3.496 0.000712 ***
weight       1.87029    0.04432  42.201  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.683 on 98 degrees of freedom
Multiple R-squared:  0.9478,    Adjusted R-squared:  0.9473 
F-statistic:  1781 on 1 and 98 DF,  p-value: < 2.2e-16

```

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. Source code is available at https://go-bayes.github.io/psych-447/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Bulbulia (2021, March 17). Psych 447: Week 2 Workbook Solutions. Retrieved from https://vuw-psych-447.netlify.app/workbooks/W_2_s/

BibTeX citation

@misc{bulbulia2021week,
  author = {Bulbulia, Joseph},
  title = {Psych 447: Week 2 Workbook Solutions},
  url = {https://vuw-psych-447.netlify.app/workbooks/W_2_s/},
  year = {2021}
}