1.1 Examples

  1. What is the mean and standard deviation of the price of diamonds with a clarity of VVS2 and carat between 2.0 and 2.1?
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
diamonds |>
  filter(clarity == "VVS2" & (carat >= 2.0 | carat <= 2.1)) |>
  summarise(mean = mean(price),
            sd = sd(price))
## # A tibble: 1 × 2
##    mean    sd
##   <dbl> <dbl>
## 1 3284. 3822.
# OR

mean(diamonds$price[diamonds$clarity == "VVS2" & (diamonds$carat>=2.0 | diamonds$carat <= 2.1)])
## [1] 3283.737
sd(diamonds$price[diamonds$clarity == "VVS2" & (diamonds$carat>=2.0 | diamonds$carat <= 2.1)])
## [1] 3821.648

3. Try it yourself

From the carData::Salaries data set of 2008-2009 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S., create a graph that presents:

  1. The mean salary per years since PhD with regards to sex per discipline.
  2. Make sure that the axis, facets and legend are properly named.
  3. Rescale the y axis to have salaries in increments of 25000 and the x axis to have years in increments of 5.
  4. Make sure the legend is on top of the graph.
range(carData::Salaries$yrs.since.phd)
## [1]  1 56
range(carData::Salaries$salary)
## [1]  57800 231545
carData::Salaries |> 
  mutate(discipline=factor(discipline,
                           levels=c("A","B"),
                           labels=c("Theoretical","Applied"))) |> 
  group_by(yrs.since.phd,sex,discipline) |> 
  summarise(meanSal=mean(salary)) |> 
  ggplot(aes(x=yrs.since.phd,y=meanSal,color=sex))+
  geom_line()+
  scale_x_continuous("\nYears since PhD.",breaks=seq(0,56,5))+
  scale_y_continuous("Salaries in USD\n",breaks=seq(0,250000,25000))+
  scale_color_brewer("",palette = "Set1")+
  facet_wrap(~discipline)+
  theme_minimal()+
  theme(legend.position = "top")
## `summarise()` has grouped output by 'yrs.since.phd', 'sex'. You can override
## using the `.groups` argument.