GAPS R Workshop 2025

class: inverse,middle,center
<style type="text/css">
.purpleb {
  font-weight: bold;
  color: #4F2683;
  font-size: 1.25em;
}

.small {
  font-size:.75rem;
}
.tiny {
  font-size:.25rem;
}
.shift { 
  position:relative; 
  top: -40px;
  }

.plot-callout {
  height: 225px;
  width: 450px;
  bottom: 5%;
  right: 5%;
  position: absolute;
  padding: 0px;
  z-index: 100;
}
.plot-callout img {
  width: 100%;
  border: 4px solid
  #  23373B;
}
</style>

# GAPS R Workshop 2025

### William Poirier

2025-08-27

Please go to

### williampo1.github.io/lab/

And download the material for this year's workshop

<i>Thank you to Western Research, the Society of Graduate Students, and the School of Graduate & Postdoctoral Studies for their support!</i>

---
## What will we be doing today?

.pull-left[
**Main Objective**: Getting familiar with R & RStudio.

**Schedule**:
- 10:00-12:00: First Steps
- 12:00-13:00: Lunch
- 13:00-15:00: Working with R

**ASK QUESTIONS!**

Katy and Noah will be roaming the room to troubleshoot.
]
.pull-right[
<img src="gifs/giphy.gif" width="65%" style="display: block; margin: auto;" />
]

---

class: inverse,middle,center

## Session 1: First Steps

---

## Session 1: First Steps

1. What is R?

2. Managing anxieties

3. Software installation

4. RStudio interface

5. Replication: <i>Why do majoritarian systems benefit the right?</i> (Liñeira and Riera, 2024)

---

## What is R?

.pull-left[
<img src="images/Rlogo.png" width="50%" style="display: block; margin: auto;" />

- A programming language
]
.pull-right[
<img src="images/RStudio_logo_flat.svg" width="100%" style="display: block; margin: auto;" />

- A place to write stuff
]

---

## What is R?

.pull-left[
- French; English; Spanish; Mandarin; Japanese; Arabic.

- Are "arguably" programming languages
]
.pull-right[
<img src="images/Microsoft_Office_Word.svg" width="50%" style="display: block; margin: auto;" />

- A place to write stuff
]

---

## What is R?

**A programming language made by statisticians for statisticians.**

It will help you to:

- domesticate (or clean) raw data;
- perform statistical analyses;
- graph your results;
- scrape the web;
- become cooler than folks who use Stata.

Why R?

- Free;
- Active research community developping packages;
- Handles any data format;
- Great point of entry for "real" programming languages.

---

## Managing anxieties

.pull-left[
- Learning how to code is learning how to talk to your computer.

- R is stupid, it does exactly what you tell it to do.

- You're going to make mistakes and that's great!

- Don't focus on the syntax, focus on the principles.

- Time and effort are the only things you need.

- You'll go from hating it to loving it in no time.

- Google is your best friend!
]
.pull-right[
<img src="gifs/eye.gif" width="100%" style="display: block; margin: auto;" />
]

I'll say this again: **Don't try to learn the syntax by heart, focus on the principles!**

You'll learn the syntax as you go.

---

## Sofware installation — R

.pull-left[
Go to CRAN: .purpleb[https://cloud.r-project.org]

- Comprenhensive R Archive Network;
- Used to distribute both R and R packages.
]
.pull-right[
<img src="images/Rlogo.png" width="50%" style="display: block; margin: auto;" />
]

---

## Sofware installation — R

.pull-left[
.purpleb[Mac]
<img src="images/R_down_mac.png" width="100%" style="display: block; margin: auto;" />
<img src="images/R_down_mac2.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
.purpleb[Windows]
<img src="images/R_down_pc.png" width="100%" style="display: block; margin: auto;" />
<img src="images/R_down_pc2.png" width="100%" style="display: block; margin: auto;" />
<img src="images/R_down_pc3.png" width="100%" style="display: block; margin: auto;" />
]

---

## Sofware installation — R Studio

.pull-left[
Go to Posit: .purpleb[https://posit.co/download/rstudio-desktop/]
]
.pull-right[
<img src="images/rstudio_down_mac.png" width="100%" style="display: block; margin: auto;" />
]

---

## RStudio interface

.purpleb[OPEN RSTUDIO AND FOLLOW ME!!!]

The next few slides are here as a reference.

---

## RStudio interface — Overview

---

## RStudio interface — Editor

---

## RStudio interface — Hello World

---

## RStudio interface — Hello World

- To run a line of code:
  1. Put cursor anywhere on line and press `cmd + enter`/`ctrl + enter`.
  2. Select line or multiple lines and press `cmd + enter`/`ctrl + enter`.
  2. Select line or multiple lines and click on `Run` button at top of screen.

---
## RStudio interface — Housekeeping

---
## RStudio interface — Housekeeping

---
## RStudio interface — Housekeeping

---

## Basics — Directories

.pull-left[
- **Directory** = How your computer organizes files and folders.
  - From now own, you need to be a neat freak when it comes to this!
- **Working directory** = Where R is pointing to
  - Most errors of beginners comes from wrong working directories.
- Shortcut, use when folder/file selected:
  - Mac: `option + command + c`
  - PC: `shift + right click > Copy as Path`

]
.pull-right[
<img src="gifs/filer.gif" width="50%" style="display: block; margin: auto;" />
]

``` r
# Option 1: where you want
setwd("/Users/williampoirier/Dropbox/Website/files/uwo/R_Workshop_2025/rcode") # Mac
setwd("C:\Users\williampoirier\Dropbox\Website\files\uwo\R_Workshop_2025\rcode") # PC

# Option 2: where the R file is saved
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
```

---

## Basics — Packages

.pull-left[
- R ships with a suite of basic functions, i.e. base R.
- Packages contain extra functions that other users have created.
- 2 things to note:
  - You only need to install them once.
  - But, you need to load them each time you open a new session.
- The Tidyverse is a suite of multiple packages.
]
.pull-right[
<img src="images/tidyverse.png" width="30%" style="display: block; margin: auto;" />
]

``` r
# Install from CRAN
install.packages("tidyverse")
# Load in your session
library(tidyverse)
```

---

## Replication Exercise

.pull-left[
<img src="images/repli1.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="images/repli2.png" width="100%" style="display: block; margin: auto;" />
]

.purpleb[OPEN RSTUDIO AND FOLLOW ME!!!]

---

## Basics — Assignment

``` r
# The right way
banana <- 3

# The wrong way
banana = 3

# Global assigner
banana <<- 3

# FOR ENGLISH KEYBOARDS
# PC: alt + - 
# MAC: option + -

banana
```

```
## [1] 3
```

---

## Basics — Assignment
.pull-left[

``` r
# The right way
banana <- 3

# The wrong way
banana = 3

# Global assigner
banana <<- 3

# FOR ENGLISH KEYBOARDS
# PC: alt + - 
# MAC: option + -

banana
```

```
## [1] 3
```
]
.pull-right[
<img src="images/hw_eye.png" width="100%" style="display: block; margin: auto;" />
]

---

## Basics — Data types

.pull-left[

``` r
# Integer
Apple <- 13L
class(Apple)
```

```
## [1] "integer"
```

``` r
# Numeric
Banana <- 13
class(Banana)
```

```
## [1] "numeric"
```

``` r
# Character
Cherry <- "13"
class(Cherry)
```

```
## [1] "character"
```
]
.pull-right[

``` r
# Logical
Durian <- TRUE
class(Durian)
```

```
## [1] "logical"
```

``` r
# What happens when you add Apple and Banana? 
# What about Apple and Durian?
# What about Apple and Cherry?
```
]

---

## Basics — Data types

.pull-left[

``` r
# Integer
Apple <- 13L
class(Apple)
```

```
## [1] "integer"
```

``` r
# Numeric
Banana <- 13
class(Banana)
```

```
## [1] "numeric"
```

``` r
# Character
Cherry <- "13"
class(Cherry)
```

```
## [1] "character"
```
]
.pull-right[

``` r
# Logical
Durian <- TRUE
class(Durian)
```

```
## [1] "logical"
```

``` r
# What happens when you add Apple and Banana? 
# What about Apple and Durian?
# What about Apple and Cherry?

Apple + Banana
```

```
## [1] 26
```

``` r
Apple + Durian
```

```
## [1] 14
```
]

---

## Basics — Data structures

.panelset[
  .panel[.panel-name[Vectors 1/3]

``` r
# One data type allowed. R's basic data structure.

# A vector
stuff <- "Kumquat"
stuff
```

```
## [1] "Kumquat"
```

``` r
# Also a vector
stuff <- c("Knickknacks","Kerfuffle","Kumquat")
stuff
```

```
## [1] "Knickknacks" "Kerfuffle"   "Kumquat"
```

``` r
# Also a vector
(otherStuff <- c(T,F,T,T,T,F)) # Parenthesis around assignment prints the new object.
```

```
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE
```
  ]
  .panel[.panel-name[2/3]

``` r
#### Why c() ?

stuff <- c("Knickknacks","Kerfuffle","Kumquat")
(scoreOfWordsThatStartsWithK_1 <- 8:10) 
```

```
## [1]  8  9 10
```

``` r
# OR 
(scoreOfWordsThatStartsWithK_2 <- c(8,9,10))
```

```
## [1]  8  9 10
```

``` r
#### What if I want the score from 0 to 100 instead of 0 to 10?
(scoreOfWordsThatStartsWithK_3 <- 10*scoreOfWordsThatStartsWithK_2)
```

```
## [1]  80  90 100
```
  ]
.panel[.panel-name[3/3]

``` r
#### What if I want to relate the names to the score?
names(scoreOfWordsThatStartsWithK_1) <- stuff

# Like adding a second dimension to the data!
scoreOfWordsThatStartsWithK_1
```

```
## Knickknacks   Kerfuffle     Kumquat 
##           8           9          10
```
  ]
  .panel[.panel-name[Matrices 1/2]

``` r
# Again, only one data type. 2 dimensions of it this time.

(myMatrix <- matrix(1:9,nrow=3,ncol=3))
```

```
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
```

``` r
# Accepts all operations that matrices accept in math
# Like transpose for example
t(myMatrix)
```

```
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
```
  ]
  .panel[.panel-name[2/2]

``` r
# Works with characters as well!
letterMatrix <- matrix(letters,ncol=2)
t(letterMatrix)
```

```
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "a"  "b"  "c"  "d"  "e"  "f"  "g"  "h"  "i"  "j"   "k"   "l"   "m"  
## [2,] "n"  "o"  "p"  "q"  "r"  "s"  "t"  "u"  "v"  "w"   "x"   "y"   "z"
```

``` r
#### What if I want multiple data types? 
```
  ]
  .panel[.panel-name[Data Frames 1/]

``` r
# One data type per column, essentially a collection of vectors, i.e. an excel sheet.

(wordData <- data.frame(stuff,scoreOfWordsThatStartsWithK_2))
```

```
##         stuff scoreOfWordsThatStartsWithK_2
## 1 Knickknacks                             8
## 2   Kerfuffle                             9
## 3     Kumquat                            10
```
  ]
  .panel[.panel-name[2/2]

``` r
#### How do I change the column names?
colnames(wordData) <- c("word","score")
# OR
(wordData <- data.frame(word=stuff,
                        score=scoreOfWordsThatStartsWithK_2))
```

```
##          word score
## 1 Knickknacks     8
## 2   Kerfuffle     9
## 3     Kumquat    10
```

``` r
#### What happens if I do this?
# class(wordData)
```
  ]
  .panel[.panel-name[Lists]

``` r
# Anything you want. Can mix object type and data structures.

(myList <- list(stuff,t(letterMatrix),wordData))
```

```
## [[1]]
## [1] "Knickknacks" "Kerfuffle"   "Kumquat"    
## 
## [[2]]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "a"  "b"  "c"  "d"  "e"  "f"  "g"  "h"  "i"  "j"   "k"   "l"   "m"  
## [2,] "n"  "o"  "p"  "q"  "r"  "s"  "t"  "u"  "v"  "w"   "x"   "y"   "z"  
## 
## [[3]]
##          word score
## 1 Knickknacks     8
## 2   Kerfuffle     9
## 3     Kumquat    10
```
  ]
]

---

## Basics — Functions

``` r
#### What is a function?

myFunction <- function(stuff_in){
  # Some operation
  return(stuff_out)
}

my_mean <- function(vector){
  tmp <- sum(vector)
  n <- length(vector)
  out <- round(tmp/n,2)
  return(out)
}

normalVec <- rnorm(1000,mean=0,sd=1)
mean(normalVec)
```

```
## [1] -0.03094287
```

``` r
my_mean(normalVec)
```

```
## [1] -0.03
```

---

class: inverse,middle,center
name: lit

# LUNCH!

---

## Session 2: Working with R

1. Brief review of first session;

2. Importing data;

3. Looking at our data;

4. Indexing;

5. Operators;

6. Diving into the replication file.

---
## Review

``` r
library(tidyverse)
library(rio)
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
dat <- import("../data/MajBenRight_clean_2010plus.dta")

class(dat$polity)
```

```
## [1] "character"
```

``` r
class(dat)
```

```
## [1] "data.frame"
```

---
## Importing data

- Some data formats can readily be openned by R:
  - .RData `load()`
  - .csv `read.csv()` or `read_csv()` from tydiverse
- Others need specialized packages, there are a ton of them, just use `rio`!
  - .csv
  - .psv
  - .tsv
  - .sas7bdat/.xpt (SAS)
  - .sav/.zsav/.por (SPSS)
  - .dta (Stata)
  - .xls/xlsx (Excel)
  - .RData/.rda/.rds/.qs (R)
  - And a ton more!

---
## Looking at our data 1/2

.pull-left[

``` r
# Check number of rows and columns
dim(dat)
```

```
## [1] 37307   650
```

``` r
# Check number of rows
nrow(dat)
```

```
## [1] 37307
```

``` r
# Check number of columns
ncol(dat)
```

```
## [1] 650
```

``` r
# For vector length
length(scoreOfWordsThatStartsWithK_1)
```

```
## [1] 3
```
]
.pull-right[

``` r
# Check first or last few rows of data
head(dat)
tail(dat,n=2)
```

``` r
# Check unique values of a vector
head(unique(dat$polity))
```

```
## [1] "New Zealand" "Greece"      "Germany"     "Switzerland" "Canada"     
## [6] "Finland"
```
]

---
## Looking at our data 2/2

``` r
# Check unique values of a vector and how often they appear
table(dat$leftright)
```

```
## 
##    0    1    2    3    4    5    6    7    8    9   10 
##  945  672 1615 2768 2968 8586 3092 3522 2631  728 1121
```

``` r
# Can create crosstabs from it!
table(dat$female,dat$leftright)
```

```
##    
##        0    1    2    3    4    5    6    7    8    9   10
##   0  445  311  766 1343 1468 3979 1702 1995 1443  401  554
##   1  496  358  844 1418 1494 4564 1387 1516 1181  325  562
```

---
## Indexing — Vectors

``` r
# Let's start with a vector.
# I create one by sampling from the years of dat.
# Notice that I used $ to select the column year, I'll come back to this.
(years <- dat$year[sample(1:nrow(dat),10)])
```

```
##  [1] 2011 2011 2015 2011 2013 2015 2013 2011 2011 2012
```

``` r
# Remember, a vector has one dimension
# So if I want to know what the second value is, I only need one position.
years[2]
```

```
## [1] 2011
```

``` r
# But I can also select multiple elements from the vector
years[c(1,5,10)]
```

```
## [1] 2011 2013 2012
```

---
## Indexing — Data Frames/Matrices 1/2

``` r
# Matrices and dataframes have 2 dimensions, so you need two things to find
# what you are looking for, a row and a column number!
# In that order [r,c]
# Let's create as toy example from dat
(sub <- dat[sample(1:nrow(dat),3),] |> select(polity,proportional,year))
```

```
##            polity proportional year
## 35413    Portugal            1 2015
## 22227 Switzerland            1 2011
## 8458      Austria            1 2013
```

``` r
# If I only want to know about the first column
sub[,1]
```

```
## [1] "Portugal"    "Switzerland" "Austria"
```

``` r
# If I only want to know about the third row
sub[3,]
```

```
##       polity proportional year
## 8458 Austria            1 2013
```

``` r
# Or what is in the second row of the first column
sub[2,1]
```

```
## [1] "Switzerland"
```

---
## Indexing — Data Frames/Matrices 2/2

``` r
# Or what is in the second row of the first column
sub[2,1]
```

```
## [1] "Switzerland"
```

``` r
# If I know the names of the columns however
names(sub)
```

```
## [1] "polity"       "proportional" "year"
```

``` r
# Then I can do this
sub$polity[2]
```

```
## [1] "Switzerland"
```

---
## Operators — Math
.pull-left[

``` r
2+3
```

```
## [1] 5
```

``` r
2-3
```

```
## [1] -1
```

``` r
2*3
```

```
## [1] 6
```

``` r
2/3
```

```
## [1] 0.6666667
```
]
.pull-right[

``` r
# Modulo
2%%3
```

```
## [1] 2
```

``` r
4%%2
```

```
## [1] 0
```

``` r
# Can do this with objects as well
sub$year[1]-sub$year[2]
```

```
## [1] 4
```
]

---
## Operators — Logic 1/2

.pull-left[

``` r
sub$year
```

```
## [1] 2015 2011 2013
```

``` r
# A is equal to B
sub$year[1] == sub$year[2]
```

```
## [1] FALSE
```

``` r
# A is not equal to B
sub$year[1] != sub$year[2]
```

```
## [1] TRUE
```
]
.pull-right[

``` r
# A is higher than B
sub$year[1] > sub$year[2]
```

```
## [1] TRUE
```

``` r
# A is lower than B
sub$year[1] < sub$year[2]
```

```
## [1] FALSE
```

``` r
# A is higher or equal to B
sub$year[1] >= sub$year[2]
```

```
## [1] TRUE
```

``` r
# A is lower or equal to B
sub$year[1] <= sub$year[2]
```

```
## [1] FALSE
```
]

---
## Operators — Logic 2/2

``` r
sub$year
```

```
## [1] 2015 2011 2013
```

``` r
# OR
sub$year[1] == sub$year[2] | sub$year[1] > sub$year[3]
```

```
## [1] TRUE
```

``` r
# AND
sub$year[1] == sub$year[2] & sub$year[1] > sub$year[3]
```

```
## [1] FALSE
```

---
## Exercise

Subset the data (`dat`) such that:
  1. Only columns 390 (`polity`), 391 (`year`), and 633 (`proportional`) appear;
  2. Only data on the `year` 2015 for which `proportional` = 0

How many unique values are there in `polity`?

---
## Exercise — A solution

``` r
tmp <- dat[,c(390,391,633)]
subset <- tmp[tmp$year==2015 & tmp$proportional==0,]
unique(subset$polity)
```

```
## [1] "Canada"        "Great Britain"
```

``` r
# In tidyverse world
dat |>
  select(polity,year,proportional) |>
  filter(year==2015 & proportional == 0) |>
  distinct(polity)
```

```
##          polity
## 1        Canada
## 2 Great Britain
```

---
class: inverse,middle,center
name: lit

## Diving into the replication file