这里有一些方法。
1) sqldf 在 SQL 中,可以在复杂条件下进行连接。如果大于或等于下限且小于或等于上限,则使用between
匹配的语法。year
对于特定年份,如果没有匹配项,左连接将导致使用 NA(尽管问题示例中没有出现这种情况)。
library(sqldf)
sqldf("select year, century_name from x
left join century_bands on year between min_year and max_year")
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
2) findInterval 此方法仅使用基数 R。对于其第一个参数的每个组件,findInterval
返回其第二个参数中小于或等于它的值的数量。假定第二个参数按升序排序。返回的数字findInterval
可用于索引century_name
. findInterval
往往是相当有效的。
transform(x, year_name =
with(century_bands, century_name[findInterval(year, min_year)]))
## year year_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
虽然这不会出现在问题中,如果有可能在所有波段之外,那么我们可以通过添加与 NA相关联的year
额外行来扩展它而不更改代码,否则我们可以扩展如下:century_bands
century_name
findInterval
FindInterval <- function(x, vec, upper) {
ifelse(x < vec[1] | x > upper, NA, findInterval(x, vec))
}
transform(x, year_name =
with(century_bands, century_name[FindInterval(year, min_year, max(max_year))]))
如果无论如何使用 dplyr,我们可以替换transform
为;mutate
否则, usingtransform
消除了这种依赖性。
3) sapply另一个基本解决方案是
Match <- function(x) with(century_bands, century_name[x >= min_year & x <= max_year])
transform(x, century_name = sapply(year, Match))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
如果所有年份都在范围内,这应该就足够了。问题中的示例就是这种情况,但如果不能保证这一点,则扩展Match
如下:
Match <- function(x) {
Name <- with(century_bands, century_name[x >= min_year & x <= max_year])
if (length(Name)) Name else NA
}
4) cut此基本解决方案类似于,findInterval
但如果year
不在任何波段内,则返回 NA。
transform(x, year_name = with(century_bands, century_name[
cut(year, c(min_year, max(max_year)), label = FALSE, include.lowest = TRUE)
]))
5) car::recode 该函数允许对值进行如下重新编码。
library(car)
recodes <-
"1700:1799='18th'; 1800:1899='19th'; 1900:1999='20th'; 2000:2099='21st'; else=NA"
transform(x, year_name = recode(year, recodes))
## year year_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
为了避免对字符串进行硬编码,它可以像这样recodes
派生century_bands
recodes <- with(century_bands,
paste(sprintf("%d:%d='%s'", min_year, max_year, century_name), collapse = ";")
)
recodes <- paste0(recodes, "; else=NA")
6)扩大乐队我们可以把乐队扩大到个别年份,在这种情况下,我们可以简单地进行匹配。与任何波段都不匹配的年份会导致century_name
.
century_bands2 <- with(century_bands,
stack(setNames(Map(seq, min_year, max_year), century_name)))
transform(x, century_name = with(century_bands2, ind[match(year, values)]))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
6a)一个 tidyverse 的变化主要是这样的:
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)
century_bands2 <- century_bands %>%
{ set_names(map2(.$min_year, .$max_year, seq), .$century_name) %>%
as_tibble %>%
pivot_longer(everything(), names_to = "century_name", values_to = "year")
}
x %>% left_join(century_bands2, by = "year")
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
7) case_when。我们可以将波段定义硬编码为case_when
:
library(dplyr)
x %>% mutate(century_name = case_when(
year %in% 1700:1799 ~ "18th",
year %in% 1800:1899 ~ "19th",
year %in% 1900:1999 ~ "20th",
year %in% 2000:2099 ~ "21st",
TRUE ~ NA_character_))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
7a)另一种表达方式case_when
是:
x %>% mutate(century_name = case_when(
year < 1700 ~ NA_character_,
year < 1800 ~ "18th",
year < 1900 ~ "19th",
year < 2000 ~ "20th",
year < 2100 ~ "21st",
TRUE ~ NA_character_))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st