#libraries
if (!require(skimr)) install.packages('skimr')
if (!require(lubridate)) install.packages('lubridate')
if (!requireNamespace("devtools")) {
install.packages("devtools")
}
if (!require(easystats)) devtools::install_github("easystats/easystats")
if (!require(ggthemes)) install.packages('ggthemes')
if (!require(pmdplyr)) install.packages("pmdplyr")
if (!require(kableExtra)) install.packages("kableExtra")
# this should be part of easystats but in case not:
if (!require(report)) install.packages('report')
if (!require(brms)) install.packages('brms')
if (!require(lme4)) install.packages('lme4')
if (!require(table1)) install.packages('table1')
if (!require(modelsummary)) install.packages("modelsummary")
if (!require(naniar)) install.packages("naniar")
if (!require(ggraph)) install.packages("ggraph")
if (!require(gtsummary)) install.packages("gtsummary")
# load tidyverse
library("tidyverse")
# theme set
theme_set(theme_classic())
# uncomment below and run this code
# easystats::install_easystats_latest()
nz_0 <- readr::read_csv2(url("https://raw.githubusercontent.com/go-bayes/psych-447/main/data/nz/nz.csv"))
# take all characters and make them factors
# also get rid of duplicate rows
# Note the convention of renaming dataframe when creating a new one:
# ` nz <-nz_0 %>%... `
f<-c("None Of The Time","A Little Of The Time", "Some Of The Time", "Most Of The Time", "All Of The Time")
nz <-nz_0 %>%
dplyr::mutate_if(is.character, factor) %>%
select(-c(SWB.Kessler01,SWB.Kessler02,SWB.Kessler03,SWB.Kessler04,SWB.Kessler05, SWB.Kessler06))%>%
dplyr::mutate(Wave = as.factor(Wave))%>%
mutate(FeelHopeless = forcats::fct_relevel(FeelHopeless,f))%>%
mutate(FeelDepressed = forcats::fct_relevel(FeelDepressed,f))%>%
mutate(FeelRestless = forcats::fct_relevel(FeelRestless,f))%>%
mutate(EverythingIsEffort = forcats::fct_relevel(EverythingIsEffort,f))%>%
mutate(FeelWorthless = forcats::fct_relevel(FeelWorthless,f))%>%
mutate(FeelNervous = forcats::fct_relevel(FeelNervous,f))
# not used
# nz <- haven::zap_formats(nz)
# nz <- haven::zap_label(nz)
# nz <- haven::zap_widths(nz)
# nz <- haven::zap_labels(nz)
One of the advantages of R is that allows us to create highly effective workflows. Today, we’ll reinforce and extend the workflow skills that you’ve started to develop in previous weeks. Below we’ll be working with the nz
dataset, which is a reduced, truncated, and jittered version of waves 10 and 11 of the New Zealand Attitudes and Values Study. This dataset is for teaching only, if you’d like to learn more about the study to which it belongs, go here or here.
Suppose we want to select all variables that start with Believe
. We can do this in a number of ways.
First there is explicit selection:
# explicit selection
nz %>%
select("Believe.God", "Believe.Spirit")%>%
glimpse()
Rows: 4,126
Columns: 2
$ Believe.God <fct> Not Believe God, Not Believe God, Believe God…
$ Believe.Spirit <fct> Not Believe Spirit, Not Believe Spirit, Belie…
We can select all instances of a column that start with a certain name. For this you by using starts_with
nz %>%
select(starts_with("Believe"))%>%
glimpse()
Rows: 4,126
Columns: 2
$ Believe.God <fct> Not Believe God, Not Believe God, Believe God…
$ Believe.Spirit <fct> Not Believe Spirit, Not Believe Spirit, Belie…
By the same token, we can select all instances of a variable that ends with a certain string by using ends_with
Rows: 4,126
Columns: 2
$ NZ.Social.Conditions <dbl> 7, 7, 2, 6, 5, 5, 2, 0, 9, 7, 3, 2, 2…
$ NZ.Business.Conditions <dbl> 7, 8, 2, 6, 5, 5, 6, 5, 9, 7, 9, 5, 5…
We can cast a broader net and select all instances of a variable within a string by using contains
Rows: 4,126
Columns: 3
$ Religion.Believe.Cats <dbl> 4, 4, 1, 1, 1, 1, 1, NA, 3, 1, 4, 4, 4…
$ Believe.God <fct> Not Believe God, Not Believe God, Beli…
$ Believe.Spirit <fct> Not Believe Spirit, Not Believe Spirit…
As we can see, the net that we cast using contains
was too broad. We don’t want the Religion.Believe.Cats
.
In R, you can programme your way out of this corner as follows:
Rows: 4,126
Columns: 2
$ Believe.God <fct> Not Believe God, Not Believe God, Believe God…
$ Believe.Spirit <fct> Not Believe Spirit, Not Believe Spirit, Belie…
However, that’s inelegant; better to drop contains
altogether and revert to another method.
Death, taxes, and factors are consequence of living. Let’s look at the BigDoms
variable in the nz, which is a factor identifying large religious denominations
.
Buddhist Christian Muslim Not_Rel TheOthers <NA>
36 1204 9 2655 128 94
Note the use of ifany
to print the NAs in this table. It’s almost never sensible to ignore missing values!
Suppose we wanted to make “Not Rel” our base category for this factor. We could do so as follows:
## suppose we want "Not_Rel" as the base category, and rearrange the other levels
library(forcats) # this is part of the tidyverse package.
nz1<-nz %>%
dplyr::select(BigDoms, KESSLER6sum) %>%
dplyr::mutate(BigDoms =
forcats::fct_relevel(BigDoms, c("Not_Rel","Christian","Buddhist","Muslim","TheOthers")))
#inspect data
nz1%>%
group_by(BigDoms)%>%
count()
# A tibble: 6 x 2
# Groups: BigDoms [6]
BigDoms n
<fct> <int>
1 Not_Rel 2655
2 Christian 1204
3 Buddhist 36
4 Muslim 9
5 TheOthers 128
6 <NA> 94
The reordering makes for a more sensible model because the base category is now Not_Rel
or not-religious. Hence comparisons are to this category.
m0<- glm( KESSLER6sum ~ BigDoms, data = nz1 )
parameters::model_parameters(m0) %>%
print_html(caption = "Model of Distress by Denomination with the base category is `No Religion'")
Model of Distress by Denomination with the base category is `No Religion' | |||||
---|---|---|---|---|---|
Parameter | Coefficient | SE | 95% CI | t(3991) | p |
(Intercept) | 5.38 | 0.08 | (5.23, 5.54) | 69.49 | < .001 |
BigDoms (Christian) | -0.48 | 0.14 | (-0.76, -0.21) | -3.47 | < .001 |
BigDoms (Buddhist) | -0.97 | 0.67 | (-2.28, 0.34) | -1.45 | 0.147 |
BigDoms (Muslim) | 1.17 | 1.33 | (-1.43, 3.78) | 0.88 | 0.378 |
BigDoms (TheOthers) | 0.18 | 0.36 | (-0.53, 0.88) | 0.49 | 0.621 |
We can see the results better using a coefficient graph, which visualises the information presented in the table.
plot(parameters::model_parameters(m0) ) + labs(title = "Comparison of Religious groups to secular people",
subtitle = "Christians are a little more chilled out, \n Other denoms are less chilled out")
The base category is the comparison class. Should we infer that “The Others” denomination causes greater distress? We’ll return to this, and related questions, in the upcoming weeks. For now let’s just leave it at “probably not.”
It is almost never a good idea to transform continuous data into categorical data. However, occassionally, you will need to do so. For example, we might want to break the KESSLER6 distress indicator into its medically diagnostic components for “mild distress,” “moderate distress,” and “severe distress.” We may achieve this task using the cut
function as follows:
nz <-nz %>%
dplyr::mutate(k6cats = cut(
KESSLER6sum,
breaks = c(-Inf, 5, 13, Inf), # create Kessler 6 diagnostic categories
labels = c("Low Distress", "Moderate Distress", "Serious Distress"),
right = TRUE
))
table(nz$k6cats, useNA = "ifany")
Low Distress Moderate Distress Serious Distress
2510 1400 177
<NA>
39
ifelse
to create factorsI prefer to maintain control over how I am making the categories. For example, in the previous example, I didn’t remember whether cut
includes a value to the left or to the right. I had to look this up. However, I can use ifelse
function to explicitly create the relevant categories:
nz %>%
dplyr::mutate(k6cats1 = as.factor(ifelse(
KESSLER6sum <= 5,
"Low Distress",
ifelse(KESSLER6sum <= 13, "Moderate Distress", "Serious Distress")
))) %>%
group_by(k6cats1) %>%
count()
# A tibble: 4 x 2
# Groups: k6cats1 [4]
k6cats1 n
<fct> <int>
1 Low Distress 2510
2 Moderate Distress 1400
3 Serious Distress 177
4 <NA> 39
#check this is the same as the previous method
nz %>%
group_by(k6cats) %>%
count()
# A tibble: 4 x 2
# Groups: k6cats [4]
k6cats n
<fct> <int>
1 Low Distress 2510
2 Moderate Distress 1400
3 Serious Distress 177
4 <NA> 39
We can see that this method returns the same values as the cut
method above.
Throughout this course, we’ll be standardising and centering indicators. Occasionally, we’ll need to perform log transformations. You’ll need to know how to do this.
Suppose we want to standardise the Relid
indicator. This will transform the Relid
indicator into standard deviation units. In later seminars, we’ll explain why this transformation is useful. For now, this is how you do it:
nz1 <- nz %>%
select(Relid)%>%
mutate(religousid_s = scale(Relid, scale = TRUE, center = TRUE))
head(nz1)
# A tibble: 6 x 2
Relid religousid_s[,1]
<dbl> <dbl>
1 0 -0.632
2 0 -0.632
3 0 -0.632
4 0 -0.632
5 0 -0.632
6 0 -0.632
What happened? The variable name for our standardised variable looks weird: religious_s[ ,1]
This isn’t a worry. We use the variable as we would any other and all is fine.1
religousid_s | |||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 0.00 | -0.03 – 0.03 | 1.000 |
Observations | 3963 | ||
R2 / R2 adjusted | 0.000 / 0.000 |
Transform your data as the last step in your pipe workflow.
This is because if you filter cases, you’ll end up with a variable that isn’t measured standard deviations units
nza <- nz %>%
select(Relid, BigDoms)%>%
mutate(religousid_s = scale(Relid, scale = TRUE, center = TRUE))
nzb <- nz %>%
select(Relid, BigDoms)%>%
mutate(religousid_s = scale(Relid, scale = TRUE, center = TRUE)) %>%
filter(BigDoms !="Not_Rel")
# compare
summary(nza$religousid_s)
V1
Min. :-0.6315
1st Qu.:-0.6315
Median :-0.6315
Mean : 0.0000
3rd Qu.: 0.9223
Max. : 2.0877
NA's :163
# with
summary(nzb$religousid_s)
V1
Min. :-0.2431
1st Qu.: 0.9223
Median : 1.3107
Mean : 1.2819
3rd Qu.: 2.0877
Max. : 2.0877
NA's :69
When we filter last, the mean value in the dataset is 1.28 – everything has changed!
nz1 <- nz1 %>%
select(Relid)%>%
mutate(religousid_s = scale(Relid, scale = TRUE, center = TRUE))
head(nz1)
# A tibble: 6 x 2
Relid religousid_s[,1]
<dbl> <dbl>
1 0 -0.632
2 0 -0.632
3 0 -0.632
4 0 -0.632
5 0 -0.632
6 0 -0.632
or simply:
To center a variable we set scale = FALSE, center = TRUE
nz1 <- nz %>%
mutate(religousid_c = scale(Relid, scale = FALSE, center = TRUE))
# inspect new indicator
nz1%>%
select(Relid,religousid_c)%>%
glimpse()
Rows: 4,126
Columns: 2
$ Relid <dbl> 0, 0, 0, 0, 0, 0, 7, 7, 2, 2, 0, 0, 0, 0, 0, 0,…
$ religousid_c <dbl[,1]> <matrix[23 x 1]>
We use the log transformation for extreme values. We can create a new indicator by combining mutate
and log
as follows:
nz1 <- nz %>%
mutate(charitydonate_log = log(CharityDonate + 1))
# inspect new indicator
nz1 %>%
select(CharityDonate,charitydonate_log)%>%
glimpse()
Rows: 4,126
Columns: 2
$ CharityDonate <dbl> 180, 80, 300, 100, 4200, 3500, 400, 350, 5…
$ charitydonate_log <dbl> 5.198497, 4.394449, 5.707110, 4.615121, 8.…
Note that we have to add \[+1\] to the log transformation, as you will recall that the log of zero is undefined. You cannot obtain zero by raising it to the power of another number.
We can analyze dates, for example, for how many minutes were data collected?
date
Min. :2018-01-02
1st Qu.:2018-08-09
Median :2019-10-03
Mean :2019-05-18
3rd Qu.:2019-12-07
Max. :2020-10-06
int<-lubridate::interval(ymd("2018-01-02"), ymd("2020-10-06"))
#time in years
time_length(int, "year")
[1] 2.759563
#time in minutes
time_length(int, "minutes")
[1] 1451520
Fun! So much so you have some homework that will work with dates.
Here we’re going to graph the number of responses each day for the years of data collection.
library(lubridate)
library(ggplot2)
datrep <- nz %>%
count(day = floor_date(date, "day"))%>%
dplyr::mutate(Year = factor(ifelse(
day < "2018-01-01",
2017,
ifelse(day < "2019-01-01", 2018,
ifelse(day < "2020-01-01", 2019, 2020))
))) %>%
arrange(day)
# create the graph
ggplot(datrep, aes(day, n)) +
geom_col(aes(fill = Year)) +
scale_x_date(date_labels = "%b/%Y") +
xlab("Days") + ylab("Count of Responses") +
ggtitle("Our Dataset's Daily Counts") +
theme_classic() +
scale_fill_viridis_d()
Note that we can use the datrep
dataframe that we created to explore aspects of data collection. For example we can arrange the dataset by day in descending order of participants sampled:
datrep%>%
arrange(desc(n))
# A tibble: 607 x 3
day n Year
<date> <int> <fct>
1 2018-06-21 112 2018
2 2018-06-22 93 2018
3 2018-06-24 80 2018
4 2018-06-20 67 2018
5 2018-06-23 59 2018
6 2018-06-26 58 2018
7 2019-12-03 54 2019
8 2018-06-25 52 2018
9 2019-10-04 47 2019
10 2019-12-02 46 2019
# … with 597 more rows
Take not of that code, you might need it for your workbook.
What might we do with dates? Well we might ask, were there any inherently stressful days?
To see this, we can take average stress levels by day, and then see where the high average stress days fall.
tn<-nz %>%
select(date,KESSLER6sum,Id) %>%
group_by(date)%>%
summarise(
av_distress = mean(KESSLER6sum, na.rm = TRUE),
n = n_distinct(Id)
) %>%
arrange(desc(av_distress))%>%
glimpse()
Rows: 607
Columns: 3
$ date <date> 2018-12-16, 2020-07-06, 2018-11-13, 2019-01-05,…
$ av_distress <dbl> 21.00000, 16.00000, 15.50000, 15.00000, 15.00000…
$ n <int> 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, …
Graphing the densities reveals the following
tn%>%
ggplot(., aes(date, av_distress)) +
geom_col(aes(fill =(n))) + scale_x_date(date_labels = "%b/%Y") + theme_classic() + scale_fill_viridis_c()
Clearly the “stressful days” are an artifact of days with low numbers of participant respondents.
Let’s see whether there are any stressful days of the week. We do this by creating a weekday variable using the wday
function in the lubridate
package. Let’s graph our results using a pipe %>%
workflow:
nz %>%
select(Id, date, KESSLER6sum) %>%
mutate(weekdays = wday(date, label = TRUE)) %>%
group_by(weekdays) %>%
summarise(
mn_k6 = mean(KESSLER6sum, na.rm = TRUE),
sd_k6 = sd(KESSLER6sum, na.rm = TRUE),
n_k6w = n()
) %>%
mutate(
se_k6 = sd_k6 / sqrt(n_k6w),
lw_ci = mn_k6 - qt(1 - (0.05 / 2), n_k6w - 1) * se_k6,
up_ci = mn_k6 + qt(1 - (0.05 / 2), n_k6w - 1) * se_k6
) %>%
ggplot(., aes(x = weekdays, y = mn_k6, colour = mn_k6)) +
geom_errorbar(aes(ymin = lw_ci, ymax = up_ci), width = .1) +
geom_point(size = 3) +
scale_y_continuous(limits = c(0,7)) +
theme_classic() + scale_fill_viridis_d()
Despite the variability over the two years of data collection, the bars of the graph overlap: we don’t find differences in distress by days.
“Ok Boomer,” you ask, “what if we were to calculate distress by generational cohorts?”
My reply, I’m not a boomer, I’m a GenX-er. I’m keen to check it out:
nz$hour
NULL
nz %>%
select(GenCohort, KESSLER6sum) %>%
group_by(GenCohort) %>%
summarise(
mn_k6 = mean(KESSLER6sum, na.rm = TRUE),
sd_k6 = sd(KESSLER6sum, na.rm = TRUE),
n_k6w = n()
) %>%
mutate(
se_k6 = sd_k6 / sqrt(n_k6w),
lw_ci = mn_k6 - qt(1 - (0.05 / 2), n_k6w - 1) * se_k6,
up_ci = mn_k6 + qt(1 - (0.05 / 2), n_k6w - 1) * se_k6
) %>%
ggplot(., aes(x = GenCohort, y = mn_k6, colour = GenCohort)) +
geom_errorbar(aes(ymin = lw_ci, ymax = up_ci), width = .1) +
geom_point(size = 3) +
scale_y_continuous(limits = c(0, 7)) +
theme_classic() +
geom_hline(yintercept = 5,
colour = "red",
linetype = "dashed") +
scale_y_continuous(limits = c(0, 10)) +
theme(
legend.text = element_text(size = 6),
legend.title = element_text(size = 8),
axis.text.x = element_blank()
) +
xlab("Birth Generation Cohort") +
ylab("Kessler 6 Distress") +
labs(title = "Average Distress by Birth Cohort",
subtitle = "Red line indicates clinically moderate distress threshold") +
scale_colour_viridis_d()
Later, we’ll ask why you’re so stressed out.
Slice
Dplyr’s slice function can be handy. Say we only want the first four rows
datrep%>%
arrange(desc(n)) %>%
slice(1:4)
# A tibble: 4 x 3
day n Year
<date> <int> <fct>
1 2018-06-21 112 2018
2 2018-06-22 93 2018
3 2018-06-24 80 2018
4 2018-06-20 67 2018
Say we only want the 1st row, the 3rd row, and the 20th row
# A tibble: 3 x 3
day n Year
<date> <int> <fct>
1 2018-06-21 112 2018
2 2018-06-24 80 2018
3 2020-03-18 38 2020
Create a difference variable for change in Kessler 6
library("pmdplyr")
df <-nz %>%
dplyr::filter(!is.na(KESSLER6sum))%>%
mutate(wave = as.numeric(Wave))%>%
mutate(lag_k6 = tlag(KESSLER6sum,
.i = Id, # id variable
.t = wave # time series variable, needs to be numeric
))%>%
mutate(diff_k6 = lag_k6 - KESSLER6sum) %>%
select(Id,Wave,KESSLER6sum,diff_k6,Emp.JobSecure,Employed)%>%
arrange(desc(diff_k6))
What to do with this new variable. Well, we might explore whether employment security relates to distress change:
df %>%
filter(Wave == 2019) %>%
mutate(employed_employsecurity = as.factor(ifelse(Employed ==1, Emp.JobSecure,0)))%>%
ggplot(data = ., aes(x = diff_k6, fill = employed_employsecurity) )+
geom_histogram() +
xlab("Difference in K6 eleveation (cases above 5)") +
ylab("Counts of cases") +
labs(subtitle ="No clear relationship between unemployment insecurity and distress change")+
scale_fill_discrete(name="Employment Security 1-7") +
scale_fill_viridis_d() + theme_classic() +
theme(legend.position = "bottom")
And remarkably we don’t see much evidence in the cross-sectional analysis.
# create data frame with new variable Zero is for the unemeployed.
dfnew <- df %>%
filter(Wave == 2019) %>%
mutate(employed_employsecurity = as.numeric(ifelse(Employed == 1, Emp.JobSecure, 0)))%>%
filter(!is.na(employed_employsecurity))
head(dfnew)
# A tibble: 6 x 7
Id Wave KESSLER6sum diff_k6 Emp.JobSecure Employed
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 165 2019 0 20 NA 0
2 1936 2019 3 15 NA 0
3 1037 2019 10 14 4 1
4 722 2019 2 12 7 1
5 1568 2019 8 12 NA 0
6 1606 2019 7 12 7 1
# … with 1 more variable: employed_employsecurity <dbl>
# Graph
ggplot(dfnew, aes(y = diff_k6, employed_employsecurity)) +
geom_jitter(alpha = .2) +
geom_smooth(method = lm) +
xlab("employed_employsecurity") +
ylab("Kessler 6 distress jumps over 5") +
ggtitle("Jumps in distress change not related to employement insecurity") +
scale_fill_viridis_d() + theme_classic()
However, perhaps our indicator is misleading us. We can formally model the relationship between employment security and Kessler6 distress across two years
# create dataframe with the variables we need
dfnew2 <- df %>%
mutate(employed_employsecurity = as.numeric(ifelse(Employed == 1, Emp.JobSecure, 0))) %>%
filter(!is.na(employed_employsecurity)) %>%
dplyr::mutate(employsecurity_s = scale(employed_employsecurity))
# multi-level model
m00a<-lme4::lmer(KESSLER6sum ~ employsecurity_s * Wave + (1|Id), data = dfnew2)
plot(ggeffects::ggpredict(m00a, terms=c("employsecurity_s", "Wave")),
add.data = TRUE, jitter = 0.2, dot.alpha =.05) + geom_hline(yintercept = 5,
colour = "red",
linetype = "dashed") +
labs(title = "There is a relationship between employment security and Kessler6 distress")
This suggests a stable negative relationship between employment security and (low) distress. So is there are causal relationship? Not necessarily. Again, we return to casual inference in the upcoming weeks. For now, we want to alert you to an important lesson:
Do not read too much into your descriptive analysis!
This is especially true when creating new variables. Just because you can make a variable doesn’t mean you should use it, or interpret it!
Put differently, our workflow will require much more than descriptive statistics.
skimr
packageThe skimmer package can be helpful in detecting problems. A drawback note that it is interpreting all factors as numbers).
For example. ( I won’t run the following code, you will do so for your homework).
However, I want to point out that skimr
works with individual columns, and it accepts a tidy workflow.
Name | Piped data |
Number of rows | 4126 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
numeric | 2 |
________________________ | |
Group variables | Wave |
Variable type: numeric
skim_variable | Wave | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
KESSLER6sum | 2018 | 28 | 0.99 | 5.14 | 4.00 | 0 | 2 | 4 | 7 | 24 | ▇▆▂▁▁ |
KESSLER6sum | 2019 | 11 | 0.99 | 5.35 | 3.97 | 0 | 2 | 5 | 7 | 24 | ▇▆▂▁▁ |
HLTH.SleepHours | 2018 | 103 | 0.95 | 6.92 | 1.07 | 3 | 6 | 7 | 8 | 11 | ▁▆▇▅▁ |
HLTH.SleepHours | 2019 | 78 | 0.96 | 6.88 | 1.09 | 2 | 6 | 7 | 8 | 12 | ▁▃▇▁▁ |
Table1
& other canned table packagesIn earlier seminars, we encountered the table1
package, which makes really great html tables:
library(table1)
table1::table1(~Age +
GenCohort +
Male +
Edu +
Pol.Orient +
Relid +
BigDoms | Wave, data = nz,
overall = FALSE)
2018 (N=2063) |
2019 (N=2063) |
|
---|---|---|
Age | ||
Mean (SD) | 50.2 (13.4) | 50.9 (13.4) |
Median [Min, Max] | 52.0 [18.0, 92.0] | 53.0 [19.0, 92.0] |
GenCohort | ||
Gen Boombers: born >= 1946 & b.< 1961 | 676 (32.8%) | 676 (32.8%) |
Gen_Silent: born< 1946 | 51 (2.5%) | 51 (2.5%) |
GenX: born >=1961 & b.< 1980 | 915 (44.4%) | 915 (44.4%) |
GenY: born >=1980 & b.< 1996 | 368 (17.8%) | 368 (17.8%) |
GenZ: born >= 1996 | 53 (2.6%) | 53 (2.6%) |
Male | ||
Male | 755 (36.6%) | 752 (36.5%) |
Not_Male | 1305 (63.3%) | 1305 (63.3%) |
Missing | 3 (0.1%) | 6 (0.3%) |
Edu | ||
Mean (SD) | 5.54 (2.73) | 5.71 (2.66) |
Median [Min, Max] | 7.00 [0, 10.0] | 7.00 [0, 10.0] |
Missing | 77 (3.7%) | 50 (2.4%) |
Pol.Orient | ||
Mean (SD) | 3.58 (1.40) | 3.58 (1.37) |
Median [Min, Max] | 4.00 [1.00, 7.00] | 4.00 [1.00, 7.00] |
Missing | 119 (5.8%) | 69 (3.3%) |
Relid | ||
Mean (SD) | 1.69 (2.58) | 1.56 (2.57) |
Median [Min, Max] | 0 [0, 7.00] | 0 [0, 7.00] |
Missing | 51 (2.5%) | 112 (5.4%) |
BigDoms | ||
Buddhist | 17 (0.8%) | 19 (0.9%) |
Christian | 624 (30.2%) | 580 (28.1%) |
Muslim | 4 (0.2%) | 5 (0.2%) |
Not_Rel | 1301 (63.1%) | 1354 (65.6%) |
TheOthers | 75 (3.6%) | 53 (2.6%) |
Missing | 42 (2.0%) | 52 (2.5%) |
Unfortunately, the table1
package only prints html tables.
For publications, I might use the modelsummary
package
library("modelsummary")
nnz<-nz %>%
dplyr::select(Age,
Male,
BigDoms,
Edu,
GenCohort,
Relid,
Pol.Orient,
Wave)
modelsummary::datasummary_balance( ~ Wave, data=nnz, dinm=FALSE, output = 'table.tex')
I’ll put the \(LaTeX\) output into my document because I generally prefer to write in \(LaTeX\)
However if you want to print inline, you can simply use:
library("modelsummary")
nnz<-nz %>%
dplyr::select(Age,
Male,
BigDoms,
Edu,
GenCohort,
Relid,
Pol.Orient,
Wave)
modelsummary::datasummary_balance( ~ Wave, data=nnz, dinm=FALSE)
Mean | Std. Dev. | Mean | Std. Dev. | ||
---|---|---|---|---|---|
Age | 50.2 | 13.4 | 50.9 | 13.4 | |
Edu | 5.5 | 2.7 | 5.7 | 2.7 | |
Relid | 1.7 | 2.6 | 1.6 | 2.6 | |
Pol.Orient | 3.6 | 1.4 | 3.6 | 1.4 | |
N | % | N | % | ||
Male | Male | 755 | 36.6 | 752 | 36.5 |
Not_Male | 1305 | 63.3 | 1305 | 63.3 | |
BigDoms | Buddhist | 17 | 0.8 | 19 | 0.9 |
Christian | 624 | 30.2 | 580 | 28.1 | |
Muslim | 4 | 0.2 | 5 | 0.2 | |
Not_Rel | 1301 | 63.1 | 1354 | 65.6 | |
TheOthers | 75 | 3.6 | 53 | 2.6 | |
GenCohort | Gen Boombers: born >= 1946 & b.< 1961 | 676 | 32.8 | 676 | 32.8 |
Gen_Silent: born< 1946 | 51 | 2.5 | 51 | 2.5 | |
GenX: born >=1961 & b.< 1980 | 915 | 44.4 | 915 | 44.4 | |
GenY: born >=1980 & b.< 1996 | 368 | 17.8 | 368 | 17.8 | |
GenZ: born >= 1996 | 53 | 2.6 | 53 | 2.6 |
Above we saw how to create a clunky table using table(x)
. However, R has lots of functionality to enable better.
library(kableExtra)
nz %>%
select(k6cats, Wave) %>%
filter(!is.na(k6cats))%>%
group_by( Wave, k6cats) %>%
summarise(n = n())%>%
kbl(caption = "Distress by Year") %>%
kable_classic_2(c("striped", "hover"), full_width = TRUE)%>%
collapse_rows()
Wave | k6cats | n |
---|---|---|
2018 | Low Distress | 1272 |
2018 | Moderate Distress | 675 |
2018 | Serious Distress | 88 |
2019 | Low Distress | 1238 |
2019 | Moderate Distress | 725 |
2019 | Serious Distress | 89 |
Note that we can use the pivot_wider
function to spread the dataframe to enable a table that is easier to interpret.
Credit where credit is due: I just learned about pivot_wider
from Johannes and Thorven. I’m keen to get pivot_longer
and pivot_wider
into my vocabulary, and to do more things, like this:
nz %>%
select(k6cats, Wave) %>%
filter(!is.na(k6cats))%>%
group_by( Wave, k6cats) %>%
summarise(n = n())%>%
pivot_wider(names_from = Wave, values_from = n) %>%
kbl(caption = "Distress counts by year") %>%
kable_classic_2(c("striped", "hover"), full_width = TRUE)
k6cats | 2018 | 2019 |
---|---|---|
Low Distress | 1272 | 1238 |
Moderate Distress | 675 | 725 |
Serious Distress | 88 | 89 |
Nice!
For categorical data, in place of tables we can use bar graphs
Here’s the table:
table(nz$BigDoms)
Buddhist Christian Muslim Not_Rel TheOthers
36 1204 9 2655 128
Here’s the bar graph:
Note that we can re-order the factor levels to produce a nicer output, using fct_infreq
ggplot(nz) +
geom_bar(mapping = aes(x = fct_infreq(BigDoms)) )
What do you notice about the patterns of missingness in this graph?
Here, we find all the problem cases:
gg_miss_upset(nz)
What explains these feature of missingess?
A box plot provides visual information for the following statistics:
There’s a simple explanation here
We can use base R to investigate differences in distress among big denominations:
Here’s a ggplot boxplot:
ggplot(data = nz, aes(x = KESSLER6sum, y = BigDoms, fill = BigDoms)) +
geom_boxplot(notch=TRUE) + scale_fill_viridis_d() +
ggtitle("If the notches don't overlap, there's likely a difference") +
geom_jitter(alpha = .05)
Here’s a ggplot2 boxplot with points overlaid, and jittered. This allows us to se the differences in sample sizes
ggplot(data = nz, aes(x = KESSLER6sum, y = BigDoms, fill = BigDoms)) +
geom_boxplot(notch=TRUE) + scale_fill_viridis_d() +
ggtitle("If the notches don't overlap, there's likely a difference") +
geom_jitter(alpha = .07)
We could look at differences by wave:
ggplot(data = nz, aes(x = KESSLER6sum, y = BigDoms, fill = BigDoms)) +
geom_boxplot(notch=TRUE) + scale_fill_viridis_d() +
geom_jitter(alpha = .07) +
facet_grid(Wave ~ .) +
ggtitle("If the notches don't overlap, there's likely a difference")
Johannes will describe a method for making a correlation plot. Here is another method.
bzsec<-nz%>%
select(
Your.Future.Security,
Standard.Living,
NZ.Economic.Situation,
NZ.Business.Conditions,
Emp.JobSecure,
CharityDonate,
Id
) %>%
mutate_all(., as.numeric) %>% #make numeric
mutate(Id = as.factor(Id),
CharityDonate = log(CharityDonate + 1))# make Id a factor for the
# make a correlation plot using the "correlation" package from easystates
library(correlation)
p1<-bzsec %>%
correlation(partial = FALSE, multilevel = TRUE ) %>%
plot()
Print summary
bzsec %>%
correlation(partial = FALSE, multilevel = TRUE ) %>%
summary()
Parameter | CharityDonate | Emp.JobSecure | NZ.Business.Conditions | NZ.Economic.Situation | Standard.Living
-------------------------------------------------------------------------------------------------------------------------
Your.Future.Security | 0.05* | 0.24*** | 0.44*** | 0.31*** | 0.34***
Standard.Living | 0.09*** | 0.18*** | 0.22*** | 0.28*** |
NZ.Economic.Situation | 0.03 | 0.11*** | 0.43*** | |
NZ.Business.Conditions | 0.03 | 0.14*** | | |
Emp.JobSecure | 0.05* | | | |
Let’s set multilevel to FALSE
.
library(correlation)
p2<-bzsec %>%
select(-Id)%>% # get rid of grouping variable
correlation(partial = FALSE, multilevel = FALSE ) %>%
plot()
#print summary
bzsec %>%
select(-Id)%>% # get rid of grouping variable
correlation(partial = FALSE, multilevel = FALSE ) %>%
summary()
# Correlation Matrix (pearson-method)
Parameter | CharityDonate | Emp.JobSecure | NZ.Business.Conditions | NZ.Economic.Situation | Standard.Living
-------------------------------------------------------------------------------------------------------------------------
Your.Future.Security | 0.19*** | 0.35*** | 0.52*** | 0.41*** | 0.52***
Standard.Living | 0.22*** | 0.28*** | 0.31*** | 0.36*** |
NZ.Economic.Situation | 0.07*** | 0.15*** | 0.52*** | |
NZ.Business.Conditions | 0.08*** | 0.21*** | | |
Emp.JobSecure | 0.09*** | | | |
p-value adjustment method: Holm (1979)
library(patchwork)
# create a two panel plot
p1 / p2 +
plot_annotation(title = "Plot of multilevel (a) and single-level (b) correlation", tag_levels = 'a')
We can see an even greater correlations between the variables. This is because the model does not adjust for the repeated measures, which create dependencies in the data.
report
packageThe reports package from the easystats
group is powerful tool for saving tame. Before extolling its virtues, I’d like to point out two major limitations.
First, the package is in development. Currently, it has lots of bugs.
Second, the package uses terminology that won’t work for all contexts and purposes. For example, it uses the term “significant” to describe p values that are below the traditional p = .05 threshold.
If you learn nothing else from this course, you should learn never to use “significant” to describe a p value. You may, if you like, use “statistically signficant” however it would be better altogether if you simply dropped p-values from data analysis. We’ll show you how. With those provisos in mind, consider some useful functionality from the report
package.
# create a demographic dataframe
nz_demagraphics <- nz %>%
select(Age, GenCohort, Male, Edu, Pol.Orient, Relid, BigDoms, Wave)
# now a nice way to save you time when reporting
paste(
report::report_participants(
nz_demagraphics,
group = "Wave",
age = "Age",
sex = "Male",
education = "Edu",
spell_n = TRUE),
"were recruited in the study by through enticement by lollipops. Those who did not volunteer were coerced."
)
[1] "For the 'Wave - 2018' group: Two Thousand, Sixty Three participants (Mean age = 50.2, SD = 13.4, range: [18, 92]; 0.0% females; Mean education = 5.5, SD = 2.7, range: [0, 10]) and for the 'Wave - 2019' group: Two Thousand, Sixty Three participants (Mean age = 50.9, SD = 13.4, range: [19, 92]; 0.0% females; Mean education = 5.7, SD = 2.7, range: [0, 10]) were recruited in the study by through enticement by lollipops. Those who did not volunteer were coerced."
The table function of report
isn’t great yet. However it has some nice features. For example you should always report your session information, and doing so in tabluar form clarifies the elements
Try running the following code on your own:
r <- report_table(sessionInfo())
r
Here is another method, which you can try on your own
Here’s a demographic table (try on your own)
report_table(nz_demagraphics)
Here’s a data summary
library("report")
nz %>%
group_by(Wave)%>%
select(
"Wave",
"Age",
"Male",
"Edu",
"Relid",
"Pol.Orient",
"KESSLER6sum",
"FeelHopeless",
"FeelDepressed",
"FeelRestless",
"EverythingIsEffort",
"FeelWorthless",
"FeelNervous"
)%>%
report() %>%
summary()
The data contains 4126 observations, grouped by Wave, of the following variables:
- 2018 (n = 2063):
- Age: Mean = 50.19, SD = 13.38, range: [18, 92]
- Male: 2 levels, namely Male (n = 755), Not_Male (n = 1305) and missing (n = 3)
- Edu: Mean = 5.54, SD = 2.73, range: [0, 10], 3.73% missing
- Relid: Mean = 1.69, SD = 2.58, range: [0, 7], 2.47% missing
- Pol.Orient: Mean = 3.58, SD = 1.40, range: [1, 7], 5.77% missing
- KESSLER6sum: Mean = 5.14, SD = 4.00, range: [0, 24], 1.36% missing
- FeelHopeless: 5 levels, namely None Of The Time (n = 1012), A Little Of The Time (n = 627), Some Of The Time (n = 325), Most Of The Time (n = 58), All Of The Time (n = 11) and missing (n = 30)
- FeelDepressed: 5 levels, namely None Of The Time (n = 1481), A Little Of The Time (n = 345), Some Of The Time (n = 156), Most Of The Time (n = 35), All Of The Time (n = 12) and missing (n = 34)
- FeelRestless: 5 levels, namely None Of The Time (n = 520), A Little Of The Time (n = 751), Some Of The Time (n = 585), Most Of The Time (n = 143), All Of The Time (n = 29) and missing (n = 35)
- EverythingIsEffort: 5 levels, namely None Of The Time (n = 523), A Little Of The Time (n = 827), Some Of The Time (n = 490), Most Of The Time (n = 159), All Of The Time (n = 32) and missing (n = 32)
- FeelWorthless: 5 levels, namely None Of The Time (n = 1469), A Little Of The Time (n = 348), Some Of The Time (n = 147), Most Of The Time (n = 48), All Of The Time (n = 19) and missing (n = 32)
- FeelNervous: 5 levels, namely None Of The Time (n = 537), A Little Of The Time (n = 804), Some Of The Time (n = 509), Most Of The Time (n = 148), All Of The Time (n = 28) and missing (n = 37)
- 2019 (n = 2063):
- Age: Mean = 50.94, SD = 13.38, range: [19, 92]
- Male: 2 levels, namely Male (n = 752), Not_Male (n = 1305) and missing (n = 6)
- Edu: Mean = 5.71, SD = 2.66, range: [0, 10], 2.42% missing
- Relid: Mean = 1.56, SD = 2.57, range: [0, 7], 5.43% missing
- Pol.Orient: Mean = 3.58, SD = 1.37, range: [1, 7], 3.34% missing
- KESSLER6sum: Mean = 5.35, SD = 3.97, range: [0, 24], 0.53% missing
- FeelHopeless: 5 levels, namely None Of The Time (n = 961), A Little Of The Time (n = 660), Some Of The Time (n = 348), Most Of The Time (n = 67), All Of The Time (n = 8) and missing (n = 19)
- FeelDepressed: 5 levels, namely None Of The Time (n = 1399), A Little Of The Time (n = 403), Some Of The Time (n = 195), Most Of The Time (n = 47), All Of The Time (n = 7) and missing (n = 12)
- FeelRestless: 5 levels, namely None Of The Time (n = 501), A Little Of The Time (n = 794), Some Of The Time (n = 577), Most Of The Time (n = 158), All Of The Time (n = 17) and missing (n = 16)
- EverythingIsEffort: 5 levels, namely None Of The Time (n = 470), A Little Of The Time (n = 798), Some Of The Time (n = 572), Most Of The Time (n = 179), All Of The Time (n = 29) and missing (n = 15)
- FeelWorthless: 5 levels, namely None Of The Time (n = 1434), A Little Of The Time (n = 359), Some Of The Time (n = 193), Most Of The Time (n = 51), All Of The Time (n = 12) and missing (n = 14)
- FeelNervous: 5 levels, namely None Of The Time (n = 540), A Little Of The Time (n = 824), Some Of The Time (n = 539), Most Of The Time (n = 122), All Of The Time (n = 25) and missing (n = 13)
Notes:
More about the report package: here
This package is brought to you by easystats
When reporting your study, it is extremely important to include information about your measure. For example:
We measure psychological distress using the Kessler-6 scale (R. C. Kessler et al. 2002), which exhibits strong diagnostic concordance for moderate and severe psychological distress in large, cross-cultural samples Prochaska et al. (2012). Participants rated during the past 30 days, how often did… (a) “\(\dots\) you feel hopeless”; (b) “\(\dots\) you feel so depressed that nothing could cheer you up”; (c) \(\dots\) you feel restless or fidgety”; (d)“\(\dots\) you feel that everything was an effort”; (e) “\(\dots\) you feel worthless”; (f) “\(\dots\) you feel nervous?” Ordinal response alternatives for the Kessler-6 are: “None of the time”; “A little of the time”; “Some of the time”; “Most of the time”; “All of the time.”
We report sample descriptive statistics for indicators of personal Kessler-6 distress below in Table1.
Table 1library(gtsummary)
tb1 <-nz %>%
dplyr::select(
KESSLER6sum,
FeelHopeless,
FeelDepressed,
FeelRestless,
EverythingIsEffort,
FeelWorthless,
FeelNervous,
Wave,
) %>%
gtsummary::tbl_summary(
by = Wave,
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),
digits = all_continuous() ~ 2,
missing_text = "(Missing)"
)%>%
bold_labels()
tb1
Characteristic | 2018, N = 2,0631 | 2019, N = 2,0631 |
---|---|---|
KESSLER6sum | 5.14 (4.00) | 5.35 (3.97) |
(Missing) | 28 | 11 |
FeelHopeless | ||
None Of The Time | 1,012 / 2,033 (50%) | 961 / 2,044 (47%) |
A Little Of The Time | 627 / 2,033 (31%) | 660 / 2,044 (32%) |
Some Of The Time | 325 / 2,033 (16%) | 348 / 2,044 (17%) |
Most Of The Time | 58 / 2,033 (2.9%) | 67 / 2,044 (3.3%) |
All Of The Time | 11 / 2,033 (0.5%) | 8 / 2,044 (0.4%) |
(Missing) | 30 | 19 |
FeelDepressed | ||
None Of The Time | 1,481 / 2,029 (73%) | 1,399 / 2,051 (68%) |
A Little Of The Time | 345 / 2,029 (17%) | 403 / 2,051 (20%) |
Some Of The Time | 156 / 2,029 (7.7%) | 195 / 2,051 (9.5%) |
Most Of The Time | 35 / 2,029 (1.7%) | 47 / 2,051 (2.3%) |
All Of The Time | 12 / 2,029 (0.6%) | 7 / 2,051 (0.3%) |
(Missing) | 34 | 12 |
FeelRestless | ||
None Of The Time | 520 / 2,028 (26%) | 501 / 2,047 (24%) |
A Little Of The Time | 751 / 2,028 (37%) | 794 / 2,047 (39%) |
Some Of The Time | 585 / 2,028 (29%) | 577 / 2,047 (28%) |
Most Of The Time | 143 / 2,028 (7.1%) | 158 / 2,047 (7.7%) |
All Of The Time | 29 / 2,028 (1.4%) | 17 / 2,047 (0.8%) |
(Missing) | 35 | 16 |
EverythingIsEffort | ||
None Of The Time | 523 / 2,031 (26%) | 470 / 2,048 (23%) |
A Little Of The Time | 827 / 2,031 (41%) | 798 / 2,048 (39%) |
Some Of The Time | 490 / 2,031 (24%) | 572 / 2,048 (28%) |
Most Of The Time | 159 / 2,031 (7.8%) | 179 / 2,048 (8.7%) |
All Of The Time | 32 / 2,031 (1.6%) | 29 / 2,048 (1.4%) |
(Missing) | 32 | 15 |
FeelWorthless | ||
None Of The Time | 1,469 / 2,031 (72%) | 1,434 / 2,049 (70%) |
A Little Of The Time | 348 / 2,031 (17%) | 359 / 2,049 (18%) |
Some Of The Time | 147 / 2,031 (7.2%) | 193 / 2,049 (9.4%) |
Most Of The Time | 48 / 2,031 (2.4%) | 51 / 2,049 (2.5%) |
All Of The Time | 19 / 2,031 (0.9%) | 12 / 2,049 (0.6%) |
(Missing) | 32 | 14 |
FeelNervous | ||
None Of The Time | 537 / 2,026 (27%) | 540 / 2,050 (26%) |
A Little Of The Time | 804 / 2,026 (40%) | 824 / 2,050 (40%) |
Some Of The Time | 509 / 2,026 (25%) | 539 / 2,050 (26%) |
Most Of The Time | 148 / 2,026 (7.3%) | 122 / 2,050 (6.0%) |
All Of The Time | 28 / 2,026 (1.4%) | 25 / 2,050 (1.2%) |
(Missing) | 37 | 13 |
1
Mean (SD); n / N (%)
|
Note that you can use the gtsummary
package to create in-line referencing. For example: Average Kessler-6 distress in 2018 was 5.14 (4.00) and in 2019 was 5.35 (3.97).
The following is a brief guide to describing your method. We’ll be returning to report writing in future weeks. For now, I just want to put this on the table for you. The advice is just a guide.
Heading | Include |
Participants |
|
Materials |
|
Procedure |
|
Below are the sampling procedures from the New Zealand Attitudes and Values Study, from where the nz
teaching dataset was drawn.
The Time 10 (2018) NZAVS contained responses from 47,951 participants (18,010 retained from one or more previous wave. The sample retained 2,964 participants from the Time 1 (2009) sample (a retention rate of 45.5%). The sample retained 14,049 participants from Time 9 (2017; a retention rate of 82.3% from the previous year). Participants who provided an email address were first emailed and invited to complete an online version if they preferred. Participants who did not complete the online version (or did not provide an email) were then posted a copy of the questionnaire, with a second postal follow-up two months later. We staggered the time of contact, so that participants who had completed the previous wave were contacted approximately one year after they last completed the questionnaire. We offered a prize draw for participation (five draws each for $1000 grocery vouchers, $5000 total prize pool). All participants were posted a Season’s Greetings card from the NZAVS research team and informed that they had been automatically entered into a bonus seasonal grocery voucher prize draw. Participants were also emailed an eight-page newsletter about the study.
To boost sample size and increase sample diversity for subsequent waves, a booster sample was conducted by selecting people from the New Zealand electoral roll. As with previous booster samples, sampling was conducted without replacement (i.e., people included in previous sample frames were identified and removed from the 2018 roll). The sample frame consisted of 325,000 people aged from 18-65 randomly selected from the 2018 New Zealand Electoral Roll, who were currently residing in New Zealand (one can be registered to vote in New Zealand but living overseas). The electoral roll contained ~3,250,000 registered voters. The New Zealand Electoral Roll contains participants’ date of birth (within a one-year window), and we limited our frame to people who 65 or younger, due to our aim of retaining participants longitudinally. We concurrently advertised the survey on Facebook via a $5000 paid promotion of a link to a YouTube video describing the NZAVS and the large booster sample we were conducting. The advertisement targeted men and women aged 18-65+ who lived in New Zealand and ran for 14 days. This paid promotion reached 147,296 people, with 4,721 link clicks (i.e., clicking to watch the video), according to Facebook. The goal of the paid promotion was twofold: (a) to increase name recognition of the NZAVS during the period in which questionnaires were being posted, and (b) to help improve retention by potentially reaching previous participants who happened to see the advertisement. A total of 29,293 participants who were contained in our sample frame completed the questionnaire (response rate = 9.2% when adjusting for the 98.2% accuracy of the 2018 electoral roll). A further 648 participants completed the questionnaire, but were unable to be matched to our sample frame (for example, due to a lack of contact information) or were unsolicited opt-ins. Informal analysis indicates that unsolicited opt-ins were often the partners of existing participants.
The Time 11 wave was conducted during COVID-19 pandemic. Procedures thus differed in that there was an increased focus on online deliver using email reminders and extensive Facebook advertising, no Christmas card, and incomplete phoning of non-respondents.
The Time 11 (2019) NZAVS contained responses from 42,684 participants (36,522 retained from one or more previous wave. The sample retained 2,506 participants from the Time 1 (2009) sample (a retention rate of 38.4%). The sample retained 34,782 participants from Time 10 (2018; a retention rate of 72.5% from the previous year). Participants who provided an email address were first emailed and invited to complete an online version if they preferred. Participants who did not complete the online version (or did not provide an email) were then posted a copy of the questionnaire, with a second postal follow-up two months later. We staggered the time of contact, so that participants who had completed the previous wave were contacted approximately one year after they last completed the questionnaire. A second reminder email was sent approximately four months following the first email attempt. We offered a prize draw for participation (five draws each for $1000 grocery vouchers, $5000 total prize pool). Participants were also emailed an eight-page newsletter about the study. As in past years, three attempts were made to phone non-respondents using each available cell and landline number. However, due to the university closure during COVID-19 lockdowns, phoning attempts were made for only 54% of the phoning pool (11,687 from a total of 21,636 non-respondents who provided at least one phone number).
Two additional forms of recruitment were also introduced during Time 11. The first was a large information box in the questionnaire (taking a full page on the paper version), which asked people: ‘Do you have a partner who would also like to join the NZAVS?’ with additional details for how partners might join the study (see questionnaire for the full text). The second was a Facebook advertisement. The advertisement targeted men and women aged 18-65+ who lived in New Zealand and ran from and 4th April 2020 – 4th July 2020 (overlapping with New Zeeland’s first lockdown period and recovery), and again from 18th August 2020 – 4th September (during the second Auckland lockdown). Given the unprecedented nature of the COVID-19 lockdowns, we thought it important to maximise sampling during these periods. The goal of the Facebook advertisement was threefold: (a) to increase name recognition of the NZAVS and remind people to complete the paper/online version already posted/emailed to them, (b) to help improve retention by potentially reaching previous lost participants who happened to see the advertisement, and (c) to recruit new participants (and also the partners of existing participants) while people were at home with some possibly having more free time during lockdown. This last goal was indirect and not explicitly stated it in the advertisement.
The Facebook advertisement read as follows: “Participate in the New Zealand Attitudes and Values Study. Complete the 2020 Questionnaire online” with the body of text: “If you are part of the NZAVS, but have not heard from us in the last year, then please consider completing the 2020 questionnaire online. The study is more important than ever as we aim to understand the impact of COVID-19 on mental health, wellbeing and resilience in our communities. We wish you all the best at this time and hope you keep well and stay safe.” This paid promotion reached 883,969 people, with 37,850 link clicks (i.e., clicking the link for the Qualtrics survey) according to Facebook. A total of 6106 people continued complete the questionnaire and provide full contact details, and were thus included in the dataset (4734 were new participants opting in to the study, and 1372 were previously ‘lost’ participants).
papaja
packageAPA style advice here
Notice, the intercept here is zero. This because we centered the new indicator at zero, and we wrote a model that is estimating the population average for this outcome (an intercept-only model). Don’t worry if you don’t know what an intercept is, we’ll get to regression in a few weeks.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. Source code is available at https://go-bayes.github.io/psych-447/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bulbulia (2021, March 16). Psych 447: Consolidation of skills. Retrieved from https://vuw-psych-447.netlify.app/posts/4_1/
BibTeX citation
@misc{bulbulia2021consolidation, author = {Bulbulia, Joseph}, title = {Psych 447: Consolidation of skills}, url = {https://vuw-psych-447.netlify.app/posts/4_1/}, year = {2021} }