R programming basics

class: title-slide, inverse, bottom
background-image: url(img/gradient-background.png)
background-size: cover

# R programming basics
### CFE R Training - Module 3

<br/>

María Paula Caldas and Jolien Noels

---
class: middle

# Useful links

[Slides](https://oecd-cfe-eds.github.io/cfe-r-training/03_programming.html), if you want to navigate on your own

[RStudio Project](https://rstudio.cloud/project/2940340), to try out the exercises

[Teams Space](https://teams.microsoft.com/l/team/19%3aewi8FvNssJHrCsxFDSJbA7IL4q4kGH0E8IRMfMp8zPA1%40thread.tacv2/conversations?groupId=c957fd70-0f85-4bcc-b3a4-e453919316de&tenantId=ac41c7d4-1f61-460d-b0f4-fc925a2b471c), for discussions

[Github repository](https://github.com/mpaulacaldas/cfe-r-training), for later reference

---
class: middle

# Housekeeping matters

🙋‍♀️&nbsp; During the session, ask questions in the chat 
or raise your hand

📷&nbsp; Sessions are recorded. Remember to turn off your camera if 
its your preference

💬&nbsp; After the session, post follow-up questions, 
comments or reactions in the [Teams space](https://teams.microsoft.com/l/team/19%3aewi8FvNssJHrCsxFDSJbA7IL4q4kGH0E8IRMfMp8zPA1%40thread.tacv2/conversations?groupId=c957fd70-0f85-4bcc-b3a4-e453919316de&tenantId=ac41c7d4-1f61-460d-b0f4-fc925a2b471c)

📝&nbsp; If you are going through these slides on your own, type `p` 
to see the presenter notes

???

The presenter notes are where you might also find OECD or CFE specific 
information.

---
class: middle

# Learning objectives for today

1. Understand the difference between vectors, lists and data frames and how to subset them

1. Know how to write simple loops and functions

1. Understand how to use functions and lists as an alternative to loops

---
class:inverse, bottom, left

# Vectors

---
# Vectors

.pull-left[

__Vectors__ are the most common and basic data structure in R.

There are two types:

- Atomic vectors
- Lists

There is a related object, `NULL` which represents the _absence_ of a vector.

]

.pull-right[

<img src="https://github.com/hadley/r4ds/blob/master/diagrams/data-structures-overview.png?raw=true" width="100%" />
]

.footnote[
Ref: [Chapter 20, R4DS](https://r4ds.had.co.nz/vectors.html#important-types-of-atomic-vector)
]

---
# Atomic vectors

.pull-left[

Atomic vectors are __homogeneous__, i.e. all their elements need to be of the same _type_.

```r
lgl <- c(TRUE, FALSE, TRUE, TRUE)

int <- c(1L, 2L, 3L, 4L)

dbl <- c(5.5, 33.6, 8, 12.5)

chr <- c("coffee", "café")
```

We can create vectors using `c()` and assign them to an object using `<-`

]
.pull-right[
<img src="https://github.com/hadley/r4ds/blob/master/diagrams/data-structures-overview.png?raw=true" width="100%" />
]

.footnote[
Ref: [Chapter 20, R4DS](https://r4ds.had.co.nz/vectors.html#important-types-of-atomic-vector)
]

---
# Vector types

It can be difficult to tell the _type_ of a vector solely by the way it is printed in the console.

.pull-left[

When we print them in R:

```r
lgl
#> [1]  TRUE FALSE  TRUE  TRUE

int
#> [1] 1 2 3 4

dbl
#> [1]  5.5 33.6  8.0 12.5

chr
#> [1] "coffee" "café"
```

]
.pull-right[

When we explore their type:

```r
typeof(lgl)
#> [1] "logical"

typeof(int)
#> [1] "integer"

typeof(dbl)
#> [1] "double"

typeof(chr)
#> [1] "character"
```

]

---
# Logical operations

.pull-left[

_From the first session:_

| Condition | Reads |
|----------|-------|
| `x > y`  | `x` is greater than `y` |
| `x >= y` | `x` is greater or equal to `y` |
| `x == 3` | `x` is equal to 3 |
| `x %in% c(2, 8)` | `x` is either 2 or 8 |
| `x > y & x < z` | `x` is greater than `y` AND smaller than `z` |
| <code>x > y &#124; x < z</code> | `x` is greater than `y` OR smaller than `z` |

]
.pull-right[

Logical operations return logical vectors

```r
telework_days <- c(2, 4:5)
office_days   <- c("monday", "wednesday")

4 %in% telework_days
#> [1] TRUE
3:5 %in% telework_days
#> [1] FALSE  TRUE  TRUE

is.character(office_days)
#> [1] TRUE
```

]

---
# Coercion

There are two ways to convert, or __coerce__, vectors from one type to another:

.pull-left[
### Explicit

```r
lgl
#> [1]  TRUE FALSE  TRUE  TRUE
as.numeric(lgl)
#> [1] 1 0 1 1

int
#> [1] 1 2 3 4
as.character(int)
#> [1] "1" "2" "3" "4"
```

]
.pull-right[
### Implicit

```r
lgl
#> [1]  TRUE FALSE  TRUE  TRUE
lgl * 5
#> [1] 5 0 5 5

dbl
#> [1]  5.5 33.6  8.0 12.5
chr
#> [1] "coffee" "café"
c(chr, dbl)
#> [1] "coffee" "café"   "5.5"    "33.6"   "8"      "12.5"
```

]

---
# Coercion

This concept is important to understand _warnings_:

```r
ages_chr <- c("29", "88", "46", ">100")
ages_num <- as.numeric(ages_chr)
#> Warning: NAs introduced by coercion

ages_num
#> [1] 29 88 46 NA
```

⚠️&nbsp; __Avoid ignoring warnings__. Warnings, as opposed to errors, don't stop code execution, so mistakes can propagate.

---
# Missing values

Each type of atomic vector has it's own type of missing value

.pull-left[

```r
NA
#> [1] NA
NA_integer_
#> [1] NA
NA_real_
#> [1] NA
NA_character_
#> [1] NA
```

]

.pull-right[

```r
typeof(NA)
#> [1] "logical"
typeof(NA_integer_)
#> [1] "integer"
typeof(NA_real_)
#> [1] "double"
typeof(NA_character_)
#> [1] "character"
```

]

<br/>
Because of coercion this often doesn't matter.

.pull-left[

```r
c("tea", NA, "té")
#> [1] "tea" NA    "té"
```

]
.pull-right[

```r
typeof(c("tea", NA, "té"))
#> [1] "character"
```

]

---
# Missing values are _contagious_

.pull-left[

Some functions have an option to remove them, but you need to be explicit about it.

```r
mean(c(10, 20, NA))
#> [1] NA
mean(c(10, 20, NA), na.rm = TRUE)
#> [1] 15
```

]

.pull-right[

Functions that remove them by default tend to give a warning:

```r
library(ggplot2)
ggplot(airquality, aes(Ozone, Temp)) +
  geom_point()
#> Warning: Removed 37 rows containing missing values (geom_point).
```

]

---
# Missing values are _contagious_

⚠️&nbsp; You can't use `==` to identify the missing values in a vector. Use `is.na()`

.pull-left[

```r
x <- c(10, 20)

x == 10
#> [1]  TRUE FALSE

x == NA
#> [1] NA NA

is.na(x)
#> [1] FALSE FALSE
```

]
.pull-right[

```r
y <- c(10, NA)

y == 10
#> [1] TRUE   NA

y == NA
#> [1] NA NA

is.na(y)
#> [1] FALSE  TRUE
```

]

---
# Recycling

Vectors of shorter length are __recycled__ to match the length of the longer vector

This makes fairly common operations quicker to type

```r
rates <- c(0.93, 0.85, 0.43)

rates * 100
#> [1] 93 85 43
rates * c(100, 100, 100)
#> [1] 93 85 43
```

---
# Recycling

Recycling can happen for vectors of any length, not just those of length 1

```r
c(10, 100, 1000, 10000) * c(1, 3)
#> [1]    10   300  1000 30000
c(10, 100, 1000, 10000) * c(1, 3, 1, 3)
#> [1]    10   300  1000 30000
```

And it also explains a fairly common warning:

```r
1:5 + 1:3
#> Warning in 1:5 + 1:3: longer object length is not a multiple of shorter object length
#> [1] 2 4 6 5 7
```

---

# Subsetting

### By position

```r
office <- c("alexandre", "nikos", "maria paula", "tahsin")

office[2]
#> [1] "nikos"

office[c(1, 4)]
#> [1] "alexandre" "tahsin"

office[-1]
#> [1] "nikos"       "maria paula" "tahsin"
```

---

# Subsetting

### With a logical vector

```r
office
#> [1] "alexandre"   "nikos"       "maria paula" "tahsin"

present_on_monday <- c(TRUE, FALSE, TRUE, TRUE)
office[present_on_monday]
#> [1] "alexandre"   "maria paula" "tahsin"
```

⚠️&nbsp; This is somewhere where we need to be aware of R's recycling rules

```r
office[c(TRUE, FALSE)]
#> [1] "alexandre"   "maria paula"
```

---

# Subsetting

### By name

Vectors can be _named_ and we can use those names to subset them

```r
names(office) <- c("BANQUET", "PATIAS", "CALDAS", "MEDHI")
office
#>       BANQUET        PATIAS        CALDAS         MEDHI 
#>   "alexandre"       "nikos" "maria paula"      "tahsin"

office[c("CALDAS", "PATIAS")]
#>        CALDAS        PATIAS 
#> "maria paula"       "nikos"
```

---
class: exercise

# 📝&nbsp; Vectors

Head to the [RStudio Cloud Project](https://rstudio.cloud/project/2940340) and follow the instructions in the `vectors.R` script.

---
class:inverse, bottom, left

# Conditions

---
# If conditions

.pull-left[
### Syntax

```
if (<CONDITION>) {
  <CODE_TO_EXECUTE_IF_CONDITION_IS_TRUE>
}
```
]
.pull-right[
### Examples

```r
if (TRUE) {
  "I will print!"
}
#> [1] "I will print!"

if (FALSE) {
  "Nothing will happen!"
}
```
]

---
# Conditions: warnings

⚠️&nbsp; `<CONDITION>` should evaluate to a logical vector of length 1. If it's a longer vector, only the first element will be used.

```r
if (c(TRUE, FALSE)) {
  "I will print!"
}
#> Warning in if (c(TRUE, FALSE)) {: the condition has length > 1 and only the first
#> element will be used
#> [1] "I will print!"

if (c(FALSE, TRUE)) {
  "Nothing will happen!"
}
#> Warning in if (c(FALSE, TRUE)) {: the condition has length > 1 and only the first
#> element will be used
```

---
# Conditions: errors

⚠️&nbsp; `<CONDITION>` Conditions will fail with missing values.

```r
if (NA) {
  "I will fail!"
}
#> Error in if (NA) {: missing value where TRUE/FALSE needed

if (c(NA, TRUE)) {
  "I will fail too!"
}
#> Warning in if (c(NA, TRUE)) {: the condition has length > 1 and only the first element
#> will be used
#> Error in if (c(NA, TRUE)) {: missing value where TRUE/FALSE needed
```

---
# If-else conditions

.pull-left[
### Syntax

```
if (<CONDITION>) {
  <CODE_TO_EXECUTE_IF_CONDITION_IS_TRUE>
} else {
  <CODE_TO_EXECUTE_OTHERWISE>
}
```
]
.pull-right[
### Examples

```r
language <- "spanish"

if (language == "spanish") {
  "¡Hola!"
} else {
  "Hi!"
}
#> [1] "¡Hola!"
```

]

---
# Multiple conditions

.pull-left[
### Syntax

```
if (<CONDITION1>) {
  <CODE_TO_EXECUTE_IF_CONDITION1_IS_TRUE>
} else if (<CONDITION2>) {
  <CODE_TO_EXECUTE_IF_CONDITION2_IS_TRUE>
} else {
  <CODE_TO_EXECUTE_OTHERWISE>
}
```
]
.pull-right[
### Examples

```r
language <- "french"

if (language == "spanish") {
  "¡Hola!"
} else if (language == "french") {
  "Salut!"
} else {
  "Hi!"
}
#> [1] "Salut!"
```

]

---
# Vectorised alternatives

The vectorised alternatives are more useful when we work with data frames.

```r
byear <- c(1970, 2005, 1992, 1962)
```

.pull-left[

### `ifelse()`

```r
ifelse(2021 - byear < 18, "young", "old")
#> [1] "old"   "young" "old"   "old"
```

]

.pull-right[

### `dplyr::case_when()`

```r
dplyr::case_when(
  byear <= 1964        ~ "boomer",
  byear %in% 1965:1980 ~ "gen x",
  byear %in% 1981:1996 ~ "millenial",
  TRUE                 ~ "gen z"
)
#> [1] "gen x"     "gen z"     "millenial" "boomer"
```

]

---
class: exercise

# 📝&nbsp; Conditions

Head to the [RStudio Cloud Project](https://rstudio.cloud/project/2940340) and follow the instructions in the `conditions.R` script.

---
class:inverse, bottom, left

# Loops

---
# For loops

.pull-left[
### Steps

1. Create an empty vector to store results

2. Specify the sequence to iterate over

3. Define what you want to do and where you want to store the result

4. (Optional) print the output

]
.pull-right[
### Structure

```r
output <- vector("double", 7)
for (m in seq_along(output)) {
  output[m] <- format(
    Sys.Date() + m, 
    "%e %b, %Y"
    )
}
output
#> [1] " 1 Oct, 2021" " 2 Oct, 2021" " 3 Oct, 2021" " 4 Oct, 2021" " 5 Oct, 2021"
#> [6] " 6 Oct, 2021" " 7 Oct, 2021"
```
]

---
# For loops
### Less-than-ideal patterns

- _Growing the output vector with each iteration_ is computationally inefficient. If you know the size that the output vector should be, use `vector()`.

- _Using colon notation and `length()` to define the sequence_ can lead to unexpected behaviour. What if `days` was a zero-length vector? What would `1:length(days)` return? What would `seq_along(days)` return?

```r
days   <- c("tomorrow", "day after tomorrow")
output <- NULL
for (m in 1:length(days)) {
  output <- c(output, format(Sys.Date() + m, "%e %b, %Y"))
}
output
#> [1] " 1 Oct, 2021" " 2 Oct, 2021"
```

---
class: exercise

# 📝&nbsp; For loops

Head to the [RStudio Cloud Project](https://rstudio.cloud/project/2940340) and follow the instructions in the `loops.R` script.

---
class: inverse, bottom, left
background-image: url(https://images.unsplash.com/photo-1421986872218-300a0fea5895?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=634&q=80)
background-size: cover

# Break

---
class:inverse, bottom, left

# Lists

---
# Lists

.pull-left[
Unlike atomic vectors, lists can be __heterogeneous__.

They can be made up of vectors of many different types, including other lists.

```r
a <- list(
  a = 1:3, 
  b = "a string", 
  c = pi, 
  d = list(-1, -5)
  )
```

]
.pull-right[
<img src="https://github.com/hadley/r4ds/blob/master/diagrams/data-structures-overview.png?raw=true" width="100%" />
]

.footnote[
Ref: [Chapter 20, R4DS](https://r4ds.had.co.nz/vectors.html#important-types-of-atomic-vector)
]

???

Since they can contain other lists, lists are sometimes known as recursive vectors.

---
# Inspecting lists

Lists can be very big in size, so it's not always a good idea to print them to the console.

To examine their structure, one useful way is to use `str()`

```r
str(a)
#> List of 4
#>  $ a: int [1:3] 1 2 3
#>  $ b: chr "a string"
#>  $ c: num 3.14
#>  $ d:List of 2
#>   ..$ : num -1
#>   ..$ : num -5
```

In the RStudio IDE, you can also use `View(a)`

---
# Subsetting lists

.pull-left[

Extract the component, returning a list:

```r
a["b"]
#> $b
#> [1] "a string"
typeof(a["b"])
#> [1] "list"
```
]
.pull-right[

Extract the component, removing a level of hierarchy:

```r
a[["b"]]
#> [1] "a string"
typeof(a[["b"]])
#> [1] "character"

a$b
#> [1] "a string"
typeof(a$b)
#> [1] "character"
```

]

---
class: exercise

# 📝&nbsp; Subsetting lists

.pull-left[
1. Go to the [RStudio Project](https://rstudio.cloud/project/2940340).

1. Type out the different subsets presented in the figure in the right. What what are the vector types of the outputs you get?

1. Take the bottom row of the figure. How would you re-write those subsetting operations using the `$`?

]
.pull-right[
<img src="https://github.com/hadley/r4ds/blob/master/diagrams/lists-subsetting.png?raw=true" width="100%" />
]

.footnote[
Ref: [Chapter 20, R4DS](https://r4ds.had.co.nz/vectors.html#important-types-of-atomic-vector)
]

---
class:inverse, bottom, left

# Functions

---
# Functions

.pull-left[
### Elements of a function

1. __Name__: Name of the function. It's a good idea to make it a verb.
1. __Arguments__: These can be empty or have default values.
1. __Body__: With the code that you want to execute according to the values taken by the arguments. At the end, the function should _return_ a value.

]
.pull-right[

### Example

```r
greet <- function(person, language = "ENG") {
  greeting <- "Hi"
  if (language == "ESP") {
    greeting <- "Hola"
  }
  paste0(greeting, ", ", person, "!")
}

greet(person = "Jolien")
#> [1] "Hi, Jolien!"
greet("Jolien")
#> [1] "Hi, Jolien!"

greet("Jolien", "ESP")
#> [1] "Hola, Jolien!"
```
]

---
class: exercise

# 📝&nbsp; Functions

Head to the [RStudio Cloud Project](https://rstudio.cloud/project/2940340) and follow the instructions in the `functions.R` script.

---
class:inverse, bottom, left
background-image: url(https://raw.githubusercontent.com/tidyverse/purrr/master/man/figures/logo.png)
background-position: 1050px 20px
background-size: 100px

# Functional programming

---
# Iteration over vectors with purrr

The purrr package has a family of __map__ functions that allow us to iterate over vectors.

.pull-left[

<br/>

```r
map(
 .x, # for every element of .x
 .f # do .f
)
```
]
.pull-right[
<img src="https://github.com/hadley/adv-r/blob/master/diagrams/functionals/map.png?raw=true" width="80%" />
]

.footnote[
Ref: [Advanced R, Chapter 9](https://adv-r.hadley.nz/functionals.html)
]

---

# Iteration over vectors with purrr

The purrr package has a family of __map__ functions that allow us to iterate over vectors.

.pull-left[

```r
library(purrr)

triple <- function(x) x * 3
map(1:3, triple)
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 6
#> 
#> [[3]]
#> [1] 9
```

]
.pull-right[
<img src="https://github.com/hadley/adv-r/blob/master/diagrams/functionals/map.png?raw=true" width="80%" />
]

---
# Control the type of output vector

.pull-left[
By default, `map()` always returns a list.

```r
map(1:3, triple)
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 6
#> 
#> [[3]]
#> [1] 9
```

]
.pull-right[
We can change the type of the output vector with the `map_*()` variants.

```r
map_dbl(1:3, triple)
#> [1] 3 6 9

map_chr(1:3, triple)
#> [1] "3.000000" "6.000000" "9.000000"
```

]

---
# Pass arguments to `.f`

We can pass arguments to `.f` via `...`

.pull-left[

<br/>

```r
map(
 .x, # for every element of .x
 .f, # do .f
 ... # extra arguments to .f
)
```
]
.pull-right[
<img src="https://github.com/hadley/adv-r/blob/master/diagrams/functionals/map-arg.png?raw=true" width="80%" />
]

.footnote[
Ref: [Advanced R, Chapter 9](https://adv-r.hadley.nz/functionals.html)
]

---
# Pass arguments to `.f`

We can pass arguments to `.f` via `...`

.pull-left[

```r
seniority <- list(
  eds = c(2, 10, 5, 3),
  rdg = c(3, 16, NA)
)

map_dbl(seniority, mean)
#> eds rdg 
#>   5  NA
map_dbl(seniority, mean, na.rm = TRUE)
#> eds rdg 
#> 5.0 9.5
```
]
.pull-right[
<img src="https://github.com/hadley/adv-r/blob/master/diagrams/functionals/map-arg.png?raw=true" width="80%" />
]

.footnote[
Ref: [Advanced R, Chapter 9](https://adv-r.hadley.nz/functionals.html)
]

---

# Other ways to define `.f`

We can call `map()` with existing or user-defined functions:

```r
triple <- function(x) x * 3
map_dbl(1:3, triple)
#> [1] 3 6 9
```

.pull-left[

### Anonymous functions

```r
map_dbl(1:3, function(x) x * 3)
#> [1] 3 6 9
```

]

.pull-right[

### Tilde notation

```r
map_dbl(1:3, ~ .x * 3)
#> [1] 3 6 9
```

]

---
class: exercise

# 📝&nbsp; Iteration with purrr

Head to the [RStudio Cloud Project](https://rstudio.cloud/project/2940340) and follow the instructions in the `purrr.R` script.

---
class: exercise

# 👩‍💻&nbsp; Demo: Automated plots

The code used in this demonstration is in the [RStudio Cloud project](https://rstudio.cloud/project/2940340).

---
class: inverse, bottom, left
background-image: url(img/gradient-background.png)
background-size: cover

# Annex

---
# To learn more

.pull-left[

[**R for Data Science**](https://r4ds.had.co.nz/), [Section III - Program](https://r4ds.had.co.nz/program-intro.html)

[**Advanced R**](https://adv-r.hadley.nz/index.html), [Chapter 9 - Functionals](https://adv-r.hadley.nz/functionals.html)

]

.pull-right[
<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" width="60%" />
]

---
# To learn more

.pull-left[
<div class="shareagain" style="min-width:300px;margin:1em auto;">
<iframe src="https://robust-tools.djnavarro.net/artistry/#1" width="1600" height="900" style="border:2px solid currentColor;" loading="lazy" allowfullscreen></iframe>
<script>fitvids('.shareagain', {players: 'iframe'});</script>
</div>

]
.pull-right[

Danielle Navarro's [Youtube playlist](https://www.youtube.com/watch?v=aozL5TKQgfY&list=PLRPB0ZzEYegMStVRojPITLUZ6A8YGUHi-) of her __aRt programming__ class.

Jenny Bryan's [purrr workshop](https://speakerdeck.com/jennybc/purrr-workshop).

]

---
class:inverse, bottom, left

# Data frames and tibbles

---
# Data frames and tibbles

### Printing

_tibbles_ have a nicer printing method.

.pull-left[

```r
df <- airquality[1:3, ]

df
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
```
]
.pull-right[

```r
library(tibble)
tb <- tibble(df)
tb
#> # A tibble: 3 x 6
#>   Ozone Solar.R  Wind  Temp Month   Day
#>   <int>   <int> <dbl> <int> <int> <int>
#> 1    41     190   7.4    67     5     1
#> 2    36     118   8      72     5     2
#> 3    12     149  12.6    74     5     3
```

]

???

In R, tables with data are usually represented using __data frames__. The tidyverse introduces a similar structure, called __tibbles__, with slightly different behaviours.

---
# Data frames and tibbles

### Column subsetting

`$` and `[[` have the same behaviour

.pull-left[

```r
df$Ozone
#> [1] 41 36 12
df[["Ozone"]]
#> [1] 41 36 12
```
]
.pull-right[

```r
tb$Ozone
#> [1] 41 36 12
tb[["Ozone"]]
#> [1] 41 36 12
```

]

---
# Data frames and tibbles

### Column subsetting

`[` does not consistently return the same object type for data frames

.pull-left[

```r
df[, c("Ozone", "Wind")]
#>   Ozone Wind
#> 1    41  7.4
#> 2    36  8.0
#> 3    12 12.6
df[, "Ozone"]
#> [1] 41 36 12
```
]
.pull-right[

```r
tb[, c("Ozone", "Wind")]
#> # A tibble: 3 x 2
#>   Ozone  Wind
#>   <int> <dbl>
#> 1    41   7.4
#> 2    36   8  
#> 3    12  12.6
tb[, "Ozone"]
#> # A tibble: 3 x 1
#>   Ozone
#>   <int>
#> 1    41
#> 2    36
#> 3    12
```

]