class: inverse,middle,center <style type="text/css"> .purpleb { font-weight: bold; color: #4F2683; font-size: 1.25em; } .small { font-size:.75rem; } .tiny { font-size:.25rem; } .shift { position:relative; top: -40px; } .plot-callout { height: 225px; width: 450px; bottom: 5%; right: 5%; position: absolute; padding: 0px; z-index: 100; } .plot-callout img { width: 100%; border: 4px solid # 23373B; } </style> # GAPS R Workshop 2025 ### William Poirier 2025-08-27 Please go to ### williampo1.github.io/lab/ And download the material for this year's workshop <i>Thank you to Western Research, the Society of Graduate Students, and the School of Graduate & Postdoctoral Studies for their support!</i> <img src="images/social-science/PNG/SSC_Horiz_Rev.png" width="30%" style="display: block; margin: auto;" /> --- ## What will we be doing today? .pull-left[ **Main Objective**: Getting familiar with R & RStudio. **Schedule**: - 10:00-12:00: First Steps - 12:00-13:00: Lunch - 13:00-15:00: Working with R **ASK QUESTIONS!** Katy and Noah will be roaming the room to troubleshoot. ] .pull-right[ <img src="gifs/giphy.gif" width="65%" style="display: block; margin: auto;" /> ] --- class: inverse,middle,center ## Session 1: First Steps --- ## Session 1: First Steps 1. What is R? 2. Managing anxieties 3. Software installation 4. RStudio interface 5. Replication: <i>Why do majoritarian systems benefit the right?</i> (Liñeira and Riera, 2024) --- ## What is R? .pull-left[ <img src="images/Rlogo.png" width="50%" style="display: block; margin: auto;" /> - A programming language ] .pull-right[ <img src="images/RStudio_logo_flat.svg" width="100%" style="display: block; margin: auto;" /> - A place to write stuff ] --- ## What is R? .pull-left[ - French; English; Spanish; Mandarin; Japanese; Arabic. - Are "arguably" programming languages ] .pull-right[ <img src="images/Microsoft_Office_Word.svg" width="50%" style="display: block; margin: auto;" /> - A place to write stuff ] --- ## What is R? **A programming language made by statisticians for statisticians.** It will help you to: - domesticate (or clean) raw data; - perform statistical analyses; - graph your results; - scrape the web; - become cooler than folks who use Stata. -- Why R? - Free; - Active research community developping packages; - Handles any data format; - Great point of entry for "real" programming languages. --- ## Managing anxieties .pull-left[ - Learning how to code is learning how to talk to your computer. - R is stupid, it does exactly what you tell it to do. - You're going to make mistakes and that's great! - Don't focus on the syntax, focus on the principles. - Time and effort are the only things you need. - You'll go from hating it to loving it in no time. - Google is your best friend! ] .pull-right[ <img src="gifs/eye.gif" width="100%" style="display: block; margin: auto;" /> ] -- I'll say this again: **Don't try to learn the syntax by heart, focus on the principles!** You'll learn the syntax as you go. --- ## Sofware installation — R .pull-left[ Go to CRAN: .purpleb[https://cloud.r-project.org] - Comprenhensive R Archive Network; - Used to distribute both R and R packages. ] .pull-right[ <img src="images/Rlogo.png" width="50%" style="display: block; margin: auto;" /> ] --- ## Sofware installation — R .pull-left[ .purpleb[Mac] <img src="images/R_down_mac.png" width="100%" style="display: block; margin: auto;" /> <img src="images/R_down_mac2.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ .purpleb[Windows] <img src="images/R_down_pc.png" width="100%" style="display: block; margin: auto;" /> <img src="images/R_down_pc2.png" width="100%" style="display: block; margin: auto;" /> <img src="images/R_down_pc3.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Sofware installation — R Studio .pull-left[ Go to Posit: .purpleb[https://posit.co/download/rstudio-desktop/] ] .pull-right[ <img src="images/rstudio_down_mac.png" width="100%" style="display: block; margin: auto;" /> ] --- ## RStudio interface .purpleb[OPEN RSTUDIO AND FOLLOW ME!!!] The next few slides are here as a reference. --- ## RStudio interface — Overview <img src="images/RStudio_overview.png" width="80%" style="display: block; margin: auto;" /> --- ## RStudio interface — Editor <img src="images/RStudio_editor.png" width="80%" style="display: block; margin: auto;" /> --- ## RStudio interface — Hello World <img src="images/HelloWorld.png" width="80%" style="display: block; margin: auto;" /> --- ## RStudio interface — Hello World - To run a line of code: 1. Put cursor anywhere on line and press `cmd + enter`/`ctrl + enter`. 2. Select line or multiple lines and press `cmd + enter`/`ctrl + enter`. 2. Select line or multiple lines and click on `Run` button at top of screen. --- ## RStudio interface — Housekeeping <img src="images/housekeeping1.png" width="80%" style="display: block; margin: auto;" /> --- ## RStudio interface — Housekeeping <img src="images/housekeeping3.png" width="80%" style="display: block; margin: auto;" /> --- ## RStudio interface — Housekeeping <img src="images/housekeeping4.png" width="80%" style="display: block; margin: auto;" /> --- ## Basics — Directories .pull-left[ - **Directory** = How your computer organizes files and folders. - From now own, you need to be a neat freak when it comes to this! - **Working directory** = Where R is pointing to - Most errors of beginners comes from wrong working directories. - Shortcut, use when folder/file selected: - Mac: `option + command + c` - PC: `shift + right click > Copy as Path` ] .pull-right[ <img src="gifs/filer.gif" width="50%" style="display: block; margin: auto;" /> ] ``` r # Option 1: where you want setwd("/Users/williampoirier/Dropbox/Website/files/uwo/R_Workshop_2025/rcode") # Mac setwd("C:\Users\williampoirier\Dropbox\Website\files\uwo\R_Workshop_2025\rcode") # PC # Option 2: where the R file is saved setwd(dirname(rstudioapi::getSourceEditorContext()$path)) ``` --- ## Basics — Packages .pull-left[ - R ships with a suite of basic functions, i.e. base R. - Packages contain extra functions that other users have created. - 2 things to note: - You only need to install them once. - But, you need to load them each time you open a new session. - The Tidyverse is a suite of multiple packages. ] .pull-right[ <img src="images/tidyverse.png" width="30%" style="display: block; margin: auto;" /> ] ``` r # Install from CRAN install.packages("tidyverse") # Load in your session library(tidyverse) ``` --- ## Replication Exercise .pull-left[ <img src="images/repli1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="images/repli2.png" width="100%" style="display: block; margin: auto;" /> ] .purpleb[OPEN RSTUDIO AND FOLLOW ME!!!] --- ## Basics — Assignment ``` r # The right way banana <- 3 # The wrong way banana = 3 # Global assigner banana <<- 3 # FOR ENGLISH KEYBOARDS # PC: alt + - # MAC: option + - banana ``` ``` ## [1] 3 ``` --- ## Basics — Assignment .pull-left[ ``` r # The right way banana <- 3 # The wrong way banana = 3 # Global assigner banana <<- 3 # FOR ENGLISH KEYBOARDS # PC: alt + - # MAC: option + - banana ``` ``` ## [1] 3 ``` ] .pull-right[ <img src="images/hw_eye.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Basics — Data types .pull-left[ ``` r # Integer Apple <- 13L class(Apple) ``` ``` ## [1] "integer" ``` ``` r # Numeric Banana <- 13 class(Banana) ``` ``` ## [1] "numeric" ``` ``` r # Character Cherry <- "13" class(Cherry) ``` ``` ## [1] "character" ``` ] .pull-right[ ``` r # Logical Durian <- TRUE class(Durian) ``` ``` ## [1] "logical" ``` ``` r # What happens when you add Apple and Banana? # What about Apple and Durian? # What about Apple and Cherry? ``` ] --- ## Basics — Data types .pull-left[ ``` r # Integer Apple <- 13L class(Apple) ``` ``` ## [1] "integer" ``` ``` r # Numeric Banana <- 13 class(Banana) ``` ``` ## [1] "numeric" ``` ``` r # Character Cherry <- "13" class(Cherry) ``` ``` ## [1] "character" ``` ] .pull-right[ ``` r # Logical Durian <- TRUE class(Durian) ``` ``` ## [1] "logical" ``` ``` r # What happens when you add Apple and Banana? # What about Apple and Durian? # What about Apple and Cherry? Apple + Banana ``` ``` ## [1] 26 ``` ``` r Apple + Durian ``` ``` ## [1] 14 ``` ] --- ## Basics — Data structures .panelset[ .panel[.panel-name[Vectors 1/3] ``` r # One data type allowed. R's basic data structure. # A vector stuff <- "Kumquat" stuff ``` ``` ## [1] "Kumquat" ``` ``` r # Also a vector stuff <- c("Knickknacks","Kerfuffle","Kumquat") stuff ``` ``` ## [1] "Knickknacks" "Kerfuffle" "Kumquat" ``` ``` r # Also a vector (otherStuff <- c(T,F,T,T,T,F)) # Parenthesis around assignment prints the new object. ``` ``` ## [1] TRUE FALSE TRUE TRUE TRUE FALSE ``` ] .panel[.panel-name[2/3] ``` r #### Why c() ? stuff <- c("Knickknacks","Kerfuffle","Kumquat") (scoreOfWordsThatStartsWithK_1 <- 8:10) ``` ``` ## [1] 8 9 10 ``` ``` r # OR (scoreOfWordsThatStartsWithK_2 <- c(8,9,10)) ``` ``` ## [1] 8 9 10 ``` ``` r #### What if I want the score from 0 to 100 instead of 0 to 10? (scoreOfWordsThatStartsWithK_3 <- 10*scoreOfWordsThatStartsWithK_2) ``` ``` ## [1] 80 90 100 ``` ] .panel[.panel-name[3/3] ``` r #### What if I want to relate the names to the score? names(scoreOfWordsThatStartsWithK_1) <- stuff # Like adding a second dimension to the data! scoreOfWordsThatStartsWithK_1 ``` ``` ## Knickknacks Kerfuffle Kumquat ## 8 9 10 ``` ] .panel[.panel-name[Matrices 1/2] ``` r # Again, only one data type. 2 dimensions of it this time. (myMatrix <- matrix(1:9,nrow=3,ncol=3)) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9 ``` ``` r # Accepts all operations that matrices accept in math # Like transpose for example t(myMatrix) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9 ``` ] .panel[.panel-name[2/2] ``` r # Works with characters as well! letterMatrix <- matrix(letters,ncol=2) t(letterMatrix) ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] ## [1,] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" ## [2,] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" ``` ``` r #### What if I want multiple data types? ``` ] .panel[.panel-name[Data Frames 1/] ``` r # One data type per column, essentially a collection of vectors, i.e. an excel sheet. (wordData <- data.frame(stuff,scoreOfWordsThatStartsWithK_2)) ``` ``` ## stuff scoreOfWordsThatStartsWithK_2 ## 1 Knickknacks 8 ## 2 Kerfuffle 9 ## 3 Kumquat 10 ``` ] .panel[.panel-name[2/2] ``` r #### How do I change the column names? colnames(wordData) <- c("word","score") # OR (wordData <- data.frame(word=stuff, score=scoreOfWordsThatStartsWithK_2)) ``` ``` ## word score ## 1 Knickknacks 8 ## 2 Kerfuffle 9 ## 3 Kumquat 10 ``` ``` r #### What happens if I do this? # class(wordData) ``` ] .panel[.panel-name[Lists] ``` r # Anything you want. Can mix object type and data structures. (myList <- list(stuff,t(letterMatrix),wordData)) ``` ``` ## [[1]] ## [1] "Knickknacks" "Kerfuffle" "Kumquat" ## ## [[2]] ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] ## [1,] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" ## [2,] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" ## ## [[3]] ## word score ## 1 Knickknacks 8 ## 2 Kerfuffle 9 ## 3 Kumquat 10 ``` ] ] --- ## Basics — Functions ``` r #### What is a function? myFunction <- function(stuff_in){ # Some operation return(stuff_out) } my_mean <- function(vector){ tmp <- sum(vector) n <- length(vector) out <- round(tmp/n,2) return(out) } normalVec <- rnorm(1000,mean=0,sd=1) mean(normalVec) ``` ``` ## [1] -0.03094287 ``` ``` r my_mean(normalVec) ``` ``` ## [1] -0.03 ``` --- class: inverse,middle,center name: lit # LUNCH! --- ## Session 2: Working with R 1. Brief review of first session; 2. Importing data; 3. Looking at our data; 4. Indexing; 5. Operators; 6. Diving into the replication file. --- ## Review ``` r library(tidyverse) library(rio) setwd(dirname(rstudioapi::getSourceEditorContext()$path)) dat <- import("../data/MajBenRight_clean_2010plus.dta") class(dat$polity) ``` ``` ## [1] "character" ``` ``` r class(dat) ``` ``` ## [1] "data.frame" ``` --- ## Importing data - Some data formats can readily be openned by R: - .RData `load()` - .csv `read.csv()` or `read_csv()` from tydiverse - Others need specialized packages, there are a ton of them, just use `rio`! - .csv - .psv - .tsv - .sas7bdat/.xpt (SAS) - .sav/.zsav/.por (SPSS) - .dta (Stata) - .xls/xlsx (Excel) - .RData/.rda/.rds/.qs (R) - And a ton more! --- ## Looking at our data 1/2 .pull-left[ ``` r # Check number of rows and columns dim(dat) ``` ``` ## [1] 37307 650 ``` ``` r # Check number of rows nrow(dat) ``` ``` ## [1] 37307 ``` ``` r # Check number of columns ncol(dat) ``` ``` ## [1] 650 ``` ``` r # For vector length length(scoreOfWordsThatStartsWithK_1) ``` ``` ## [1] 3 ``` ] .pull-right[ ``` r # Check first or last few rows of data head(dat) tail(dat,n=2) ``` ``` r # Check unique values of a vector head(unique(dat$polity)) ``` ``` ## [1] "New Zealand" "Greece" "Germany" "Switzerland" "Canada" ## [6] "Finland" ``` ] --- ## Looking at our data 2/2 ``` r # Check unique values of a vector and how often they appear table(dat$leftright) ``` ``` ## ## 0 1 2 3 4 5 6 7 8 9 10 ## 945 672 1615 2768 2968 8586 3092 3522 2631 728 1121 ``` ``` r # Can create crosstabs from it! table(dat$female,dat$leftright) ``` ``` ## ## 0 1 2 3 4 5 6 7 8 9 10 ## 0 445 311 766 1343 1468 3979 1702 1995 1443 401 554 ## 1 496 358 844 1418 1494 4564 1387 1516 1181 325 562 ``` --- ## Indexing — Vectors ``` r # Let's start with a vector. # I create one by sampling from the years of dat. # Notice that I used $ to select the column year, I'll come back to this. (years <- dat$year[sample(1:nrow(dat),10)]) ``` ``` ## [1] 2011 2011 2015 2011 2013 2015 2013 2011 2011 2012 ``` ``` r # Remember, a vector has one dimension # So if I want to know what the second value is, I only need one position. years[2] ``` ``` ## [1] 2011 ``` ``` r # But I can also select multiple elements from the vector years[c(1,5,10)] ``` ``` ## [1] 2011 2013 2012 ``` --- ## Indexing — Data Frames/Matrices 1/2 ``` r # Matrices and dataframes have 2 dimensions, so you need two things to find # what you are looking for, a row and a column number! # In that order [r,c] # Let's create as toy example from dat (sub <- dat[sample(1:nrow(dat),3),] |> select(polity,proportional,year)) ``` ``` ## polity proportional year ## 35413 Portugal 1 2015 ## 22227 Switzerland 1 2011 ## 8458 Austria 1 2013 ``` ``` r # If I only want to know about the first column sub[,1] ``` ``` ## [1] "Portugal" "Switzerland" "Austria" ``` ``` r # If I only want to know about the third row sub[3,] ``` ``` ## polity proportional year ## 8458 Austria 1 2013 ``` ``` r # Or what is in the second row of the first column sub[2,1] ``` ``` ## [1] "Switzerland" ``` --- ## Indexing — Data Frames/Matrices 2/2 ``` r # Or what is in the second row of the first column sub[2,1] ``` ``` ## [1] "Switzerland" ``` ``` r # If I know the names of the columns however names(sub) ``` ``` ## [1] "polity" "proportional" "year" ``` ``` r # Then I can do this sub$polity[2] ``` ``` ## [1] "Switzerland" ``` --- ## Operators — Math .pull-left[ ``` r 2+3 ``` ``` ## [1] 5 ``` ``` r 2-3 ``` ``` ## [1] -1 ``` ``` r 2*3 ``` ``` ## [1] 6 ``` ``` r 2/3 ``` ``` ## [1] 0.6666667 ``` ] .pull-right[ ``` r # Modulo 2%%3 ``` ``` ## [1] 2 ``` ``` r 4%%2 ``` ``` ## [1] 0 ``` ``` r # Can do this with objects as well sub$year[1]-sub$year[2] ``` ``` ## [1] 4 ``` ] --- ## Operators — Logic 1/2 .pull-left[ ``` r sub$year ``` ``` ## [1] 2015 2011 2013 ``` ``` r # A is equal to B sub$year[1] == sub$year[2] ``` ``` ## [1] FALSE ``` ``` r # A is not equal to B sub$year[1] != sub$year[2] ``` ``` ## [1] TRUE ``` ] .pull-right[ ``` r # A is higher than B sub$year[1] > sub$year[2] ``` ``` ## [1] TRUE ``` ``` r # A is lower than B sub$year[1] < sub$year[2] ``` ``` ## [1] FALSE ``` ``` r # A is higher or equal to B sub$year[1] >= sub$year[2] ``` ``` ## [1] TRUE ``` ``` r # A is lower or equal to B sub$year[1] <= sub$year[2] ``` ``` ## [1] FALSE ``` ] --- ## Operators — Logic 2/2 ``` r sub$year ``` ``` ## [1] 2015 2011 2013 ``` ``` r # OR sub$year[1] == sub$year[2] | sub$year[1] > sub$year[3] ``` ``` ## [1] TRUE ``` ``` r # AND sub$year[1] == sub$year[2] & sub$year[1] > sub$year[3] ``` ``` ## [1] FALSE ``` --- ## Exercise Subset the data (`dat`) such that: 1. Only columns 390 (`polity`), 391 (`year`), and 633 (`proportional`) appear; 2. Only data on the `year` 2015 for which `proportional` = 0 How many unique values are there in `polity`? --- ## Exercise — A solution ``` r tmp <- dat[,c(390,391,633)] subset <- tmp[tmp$year==2015 & tmp$proportional==0,] unique(subset$polity) ``` ``` ## [1] "Canada" "Great Britain" ``` ``` r # In tidyverse world dat |> select(polity,year,proportional) |> filter(year==2015 & proportional == 0) |> distinct(polity) ``` ``` ## polity ## 1 Canada ## 2 Great Britain ``` --- class: inverse,middle,center name: lit ## Diving into the replication file