R
Author

Shitao5

Published

2023-08-30

Modified

2023-09-14

Progress

Learning Progress: 40.8%.

Learning Source

# 1 Functions

R uses lexical scoping: it looks up the values of names based on how a function is defined, not how it is called. “Lexical” here is not the English adjective that means relating to words or a vocabulary. It’s a technical CS term that tells us that the scoping rules use a parse-time, rather than a run-time structure.

R’s lexical scoping follows four primary rules:

• Functions versus variables
• A fresh start
• Dynamic lookup

R使用词法作用域（lexical scoping）：它根据函数的定义方式查找名称的值，而不是根据它的调用方式。“词法”在这里不是指与单词或词汇相关的英语形容词。它是一个技术性的计算机科学术语，告诉我们作用域规则使用的是解析时的结构，而不是运行时的结构。

R的词法作用域遵循四条主要规则：

• 函数与变量

• 全新的起点

• 动态查找

Lexical scoping determines where, but not when to look for values. R looks for values when the function is run, not when the function is created. Together, these two properties tell us that the output of a function can differ depending on the objects outside the function’s environment.

Lazy evaluation is powered by a data structure called a promise, or (less commonly) a thunk. It’s one of the features that makes R such an interesting programming language.

You cannot manipulate promises with R code. Promises are like a quantum state: any attempt to inspect them with R code will force an immediate evaluation, making the promise disappear. Later, you’ll learn about quosures, which convert promises into an R object where you can easily inspect the expression and the environment.

An error indicates that something has gone wrong, and forces the user to deal with the problem. Some languages (like C, Go, and Rust) rely on special return values to indicate problems, but in R you should always throw an error.

# 2 Environments

The job of an environment is to associate, or bind, a set of names to a set of values. You can think of an environment as a bag of names, with no implied order (i.e. it doesn’t make sense to ask which is the first element in an environment).

``library(rlang)``
``````e1 <- env(
a = FALSE,
b = "a",
c = 2.3,
d = 1:3
)

e1
``````env_print(e1)
#> Parent: <environment: global>
#> Bindings:
#> • a: <lgl>
#> • b: <chr>
#> • c: <dbl>
#> • d: <int>``````
``````env_names(e1)
#>  "a" "b" "c" "d"``````

To compare environments, you need to use `identical()` and not `==`. This is because `==` is a vectorised operator, and environments are not vectors.

``````identical(global_env(), current_env())
#>  TRUE

global_env() == current_env()
#> Error in global_env() == current_env(): comparison (==) is possible only for atomic and list types``````
``````# Parents
e2a <- env(d = 4, e = 5)
e2b <- env(e2a, a = 1, b = 2, c = 3)

e2a
#> <environment: 0x000001c5507ba6e8>
e2b
#> <environment: 0x000001c550802e00>``````
``````# find the parent of an environment with env_parent()
env_parent(e2b)
#> <environment: 0x000001c5507ba6e8>
env_parent(e2a)
#> <environment: R_GlobalEnv>``````

Only one environment doesn’t have a parent: the empty environment.

The immediate parent of the global environment is the last package you attached, the parent of that package is the second to last package you attached, …

# 3 Conditions

Every condition has default behaviour: errors stop execution and return to the top level, warnings are captured and displayed in aggregate, and messages are immediately displayed. Condition handlers allow us to temporarily override or supplement the default behaviour.

`tryCatch()` registers exiting handlers, and is typically used to handle error conditions. It allows you to override the default error behaviour. For example, the following code will return `NA` instead of throwing an error:

``````f3 <- function(x) {
tryCatch(
error = function(cnd) NA,
log(x)
)
}

f3(3)
#>  1.098612
f3("x")
#>  NA``````

The handlers set up by `tryCatch()` are called exiting handlers because after the condition is signalled, control passes to the handler and never returns to the original code, effectively meaning that the code exits.

`tryCatch()` 设置的处理程序被称为退出处理程序，因为在条件被发出后，控制权传递给处理程序，不再返回到原始代码，实际上意味着代码退出执行。

Warning

# Functional Progarmming

``library(purrr)``

# 4 Functionals

A functional is a function that takes a function as an input and returns a vector as output. Here’s a simple functional: it calls the function provided as input with 1000 random uniform numbers.

``````randomise <- function(f) f(runif(1e3))
randomise(mean)
#>  0.4902761
randomise(mean)
#>  0.490547
randomise(sum)
#>  492.6093``````

The map functions also have shortcuts for extracting elements from a vector, powered by `purrr::pluck()`. You can use a character vector to select elements by name, an integer vector to select by position, or a list to select by both name and position. These are very useful for working with deeply nested lists, which often arise when working with JSON.

map 函数还具有从向量中提取元素的快捷方式，由 `purrr::pluck()` 提供支持。你可以使用字符向量按名称选择元素，使用整数向量按位置选择元素，或者使用列表同时按名称和位置选择元素。这在处理深层嵌套的列表时非常有用，这种情况在处理 JSON 数据时经常出现。

``````x <- list(
list(-1, x = 1, y = c(2), z = "a"),
list(-2, x = 4, y = c(5, 6), z = "b"),
list(-3, x = 8, y = c(9, 10, 11))
)

# Select by name
map_dbl(x, "x")
#>  1 4 8

# Or by position
map_dbl(x, 1)
#>  -1 -2 -3

# Or both
map_dbl(x, list("y", 1))
#>  2 5 9

# You'll get an error if a component doesn't exist
map_chr(x, "z")
#> Error in `map_chr()`:
#> ℹ In index: 3.
#> Caused by error:
#> ! Result must be length 1, not 0.

# Unless you supply a .default value
map_chr(x, "z", .default = NA)
#>  "a" "b" NA``````

Note there’s a subtle difference between placing extra arguments inside an anonymous function compared with passing them to `map()`. Putting them in an anonymous function means that they will be evaluated every time `f()` is executed, not just once when you call `map()`. This is easiest to see if we make the additional argument random:

``````plus <- function(x, y) round(x + y, 2)

x <- rep(0, 4)
map_dbl(x, plus, runif(1))
#>  0.49 0.49 0.49 0.49
map_dbl(x, ~ plus(.x, runif(1)))
#>  0.17 0.87 0.18 0.37``````
``````# Purrr style
by_cyl <- split(mtcars, mtcars\$cyl)

by_cyl %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(coef) %>%
map_dbl(2)
#>         4         6         8
#> -5.647025 -2.780106 -2.192438``````

There are three basic ways to loop over a vector with a for loop:

• Loop over the elements: `for (x in xs)`

• Loop over the numeric indices: `for (i in seq_along(xs))`

• Loop over the names: `for (nm in names(xs))`

The first form is analogous to the `map()` family. The second and third forms are equivalent to the `imap()` family which allows you to iterate over the values and the indices of a vector in parallel.

`imap()` is like `map2()` in the sense that your `.f` gets called with two arguments, but here both are derived from the vector. `imap(x, f)` is equivalent to `map2(x, names(x), f)` if `x` has names, and `map2(x, seq_along(x), f)` if it does not.

`imap()` 类似于 `map2()`，因为你的 `.f` 会被调用两次，但这里的两个参数都来自向量。如果 `x` 有名称，`imap(x, f)` 等同于 `map2(x, names(x), f)`，如果没有名称，就等同于 `map2(x, seq_along(x), f)`

`reduce()` is a useful way to generalise a function that works with two inputs (a binary function) to work with any number of inputs.

`reduce()` 是一种有用的方式，可以将一个适用于两个输入（二进制函数）的函数泛化为适用于任意数量的输入。

``````set.seed(1231)
l <- map(1:4, ~ sample(1:10, 15, replace = TRUE))
str(l)
#> List of 4
#>  \$ : int [1:15] 10 10 4 5 8 10 10 9 10 3 ...
#>  \$ : int [1:15] 4 9 5 4 1 5 8 9 9 10 ...
#>  \$ : int [1:15] 7 7 1 3 1 3 5 5 7 2 ...
#>  \$ : int [1:15] 7 9 4 8 7 10 4 5 6 10 ...

# 查找出现在每个元素中的值
reduce(l, intersect)
#>  10  4  5  9  7

# 查找所有出现的值
reduce(l, union)
#>   10  4  5  8  9  3  6  7  1  2``````
``````# accumulate 返回中间结果
accumulate(l, intersect)
#> []
#>   10 10  4  5  8 10 10  9 10  3  3  4  6  7 10
#>
#> []
#>  10  4  5  8  9  6  7
#>
#> []
#>  10  4  5  9  7
#>
#> []
#>  10  4  5  9  7``````

If you’re using `reduce()` in a function, you should always supply `.init`. Think carefully about what your function should return when you pass a vector of length 0 or 1, and make sure to test your implementation.

A predicate functional applies a predicate to each element of a vector. purrr provides seven useful functions which come in three groups:

• `some(.x, .p)` returns `TRUE` if any element matches;

• `every(.x, .p)` returns `TRUE` if all elements match;

• `none(.x, .p)` returns `TRUE` if no element matches.

These are similar to `any(map_lgl(.x, .p))`, `all(map_lgl(.x, .p))` and `all(map_lgl(.x, negate(.p)))` but they terminate early: `some()` returns `TRUE` when it sees the first `TRUE`, and `every()` and `none()` return `FALSE` when they see the first `FALSE` or `TRUE` respectively.

• `detect(.x, .p)` returns the value of the first match; `detect_index(.x, .p)` returns the location of the first match.

• `keep(.x, .p)` keeps all matching elements; `discard(.x, .p)` drops all matching elements.

• `some(.x, .p)`：如果任何元素匹配，则返回 `TRUE`

• `every(.x, .p)`：如果所有元素都匹配，则返回 `TRUE`

• `none(.x, .p)`：如果没有元素匹配，则返回 `TRUE`

• `detect(.x, .p)`：返回第一个匹配的值；`detect_index(.x, .p)`：返回第一个匹配的位置。

• `keep(.x, .p)`：保留所有匹配的元素；`discard(.x, .p)`：删除所有匹配的元素。

``````df = data.frame(x = 1:3, y = letters[1:3])
detect(df, is.factor)
#> NULL
detect_index(df, is.factor)
#>  0

str(keep(df, is.factor))
#> 'data.frame':    3 obs. of  0 variables
#> 'data.frame':    3 obs. of  2 variables:
#>  \$ x: int  1 2 3
#>  \$ y: chr  "a" "b" "c"``````

`map()` and `modify()` come in variants that also take predicate functions, transforming only the elements of `.x` where `.p` is `TRUE`.

`map()``modify()` 有一些变体，它们还接受谓词函数，只会在 `.p``TRUE` 的情况下转换 `.x` 的元素。

``````df = data.frame(
num1 = c(0, 10, 20),
num2 = c(5, 6, 7),
chr1 = c("a", "b", "c"),
stringsAsFactors = FALSE
)

str(map_if(df, is.numeric, mean))
#> List of 3
#>  \$ num1: num 10
#>  \$ num2: num 6
#>  \$ chr1: chr [1:3] "a" "b" "c"
str(modify_if(df, is.numeric, mean))
#> 'data.frame':    3 obs. of  3 variables:
#>  \$ num1: num  10 10 10
#>  \$ num2: num  6 6 6
#>  \$ chr1: chr  "a" "b" "c"
str(map(keep(df, is.numeric), mean))
#> List of 2
#>  \$ num1: num 10
#>  \$ num2: num 6``````

# 5 Function Factories

A function factory is a function that makes functions. Here’s a very simple example: we use a function factory (`power1()`) to make two child functions (`square()` and `cube()`):

``````power1 = function(exp) {
function(x) {
x ^ exp
}
}

square = power1(2)
cube = power1(3)

square(3)
#>  9
cube(3)
#>  27``````
``````library(rlang)
library(ggplot2)
library(scales)``````
``````square
#> function(x) {
#>     x ^ exp
#>   }
#> <environment: 0x000001c550b7d570>
cube
#> function(x) {
#>     x ^ exp
#>   }
#> <bytecode: 0x000001c550cd8960>
#> <environment: 0x000001c550bcddd0>``````

It’s obvious where `x` comes from, but how does R find the value associated with `exp`? Simply printing the manufactured functions is not revealing because the bodies are identical; the contents of the enclosing environment are the important factors. We can get a little more insight by using `rlang::env_print()`. That shows us that we have two different environments (each of which was originally an execution environment of `power1()`). The environments have the same parent, which is the enclosing environment of `power1()`, the global environment.

``````env_print(square)
#> <environment: 0x000001c550b7d570>
#> Parent: <environment: global>
#> Bindings:
#> • exp: <dbl>
env_print(cube)
#> <environment: 0x000001c550bcddd0>
#> Parent: <environment: global>
#> Bindings:
#> • exp: <dbl>``````
``````fn_env(square)\$exp
#>  2
fn_env(cube)\$exp
#>  3``````