如果您想使用该boot
软件包的多功能性,我发现这篇博文很有用(下面的代码是从那里获得灵感的)
library(dplyr)
library(tidyr)
library(purrr)
library(boot)
set.seed(321)
mtcars %>%
group_by(vs) %>%
nest() %>%
mutate(boot_res = map(data,
~ boot(data = .$mpg,
statistic = function(x, i) mean(x[i]),
R = 1000)),
boot_res_ci = map(boot_res, boot.ci, type = "perc"),
mean = map(boot_res_ci, ~ .$t0),
lower_ci = map(boot_res_ci, ~ .$percent[[4]]),
upper_ci = map(boot_res_ci, ~ .$percent[[5]]),
n = map(data, nrow)) %>%
select(-data, -boot_res, -boot_res_ci) %>%
unnest(cols = c(n, mean, lower_ci, upper_ci)) %>%
ungroup()
#> # A tibble: 2 x 5
#> vs mean lower_ci upper_ci n
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0 16.6 15.0 18.3 18
#> 2 1 24.6 22.1 27.3 14
由reprex 包(v0.3.0)于 2020-01-22 创建
代码的一些解释:
使用 嵌套时nest()
,将创建一个列表列(默认称为data
),其中包含 2 个数据框,是整个mtcars
分组的 2 个子集vs
(包含 2 个唯一值,0 和 1)。然后,使用mutate()
and map()
,我们通过将包中boot_res
的函数应用于列表 column 来创建列表列。然后通过将函数应用于列表列等来创建列表列。使用,我们删除不再需要的列表列,通过取消嵌套和取消分组最终结果来休息。boot()
boot
data
boot_res_ci
boot.ci()
boot_res
select()
不幸的是,该代码不容易导航,但它可以用于另一个示例。
使用broom::tidy()
刚刚意识到该包broom
具有处理boot()
输出的方法的实现,如此处所指出的。这使得代码不那么冗长,输出更加完整,包括统计数据的偏差和标准误差(这里的平均值):
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(boot)
set.seed(321)
mtcars %>%
group_by(vs) %>%
nest() %>%
mutate(boot_res = map(data,
~ boot(data = .$mpg,
statistic = function(x, i) mean(x[i]),
R = 1000)),
boot_tidy = map(boot_res, tidy, conf.int = TRUE, conf.method = "perc"),
n = map(data, nrow)) %>%
select(-data, -boot_res) %>%
unnest(cols = -vs) %>%
ungroup()
#> # A tibble: 2 x 7
#> vs statistic bias std.error conf.low conf.high n
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0 16.6 -0.0115 0.843 15.0 18.3 18
#> 2 1 24.6 -0.0382 1.36 22.1 27.3 14
由reprex 包(v0.3.0)于 2020-01-22 创建
data.table
简洁的语法
但是请注意,我通过使用data.table
包而不是dplyr
:
library(data.table)
library(magrittr)
library(boot)
library(broom)
mtcars <- mtcars %>% copy %>% setDT
set.seed(321)
mtcars[, c(n = .N,
boot(data = mpg,
statistic = function(x, i) mean(x[i]),
R = 1000) %>%
tidy(conf.int = TRUE, conf.method = "perc")),
by = vs]
#> vs n statistic bias std.error conf.low conf.high
#> 1: 0 18 16.61667 -0.01149444 0.8425817 15.03917 18.26653
#> 2: 1 14 24.55714 -0.03822857 1.3633112 22.06429 27.32839
由reprex 包(v0.3.0)于 2020 年 1 月 23 日创建
使用 data.table 一次多个变量
library(data.table)
library(magrittr)
library(boot)
library(broom)
mtcars <- mtcars %>% copy %>% setDT
# Specify here the variables for which you want CIs
variables <- c("mpg", "disp")
# Function to get the CI stats, will be applied to each column of a subset of
# data (.SD)
get_ci <- function(varb, ...){
boot(data = varb,
statistic = function(x, i) mean(x[i]),
R = 1000) %>%
tidy(conf.int = TRUE, ...)
}
set.seed(321)
mtcars[, c(n = .N,
lapply(.SD, get_ci) %>%
rbindlist(idcol = "varb")),
by = vs, .SDcols = variables]
#> vs n varb statistic bias std.error conf.low conf.high
#> 1: 0 18 mpg 16.61667 -0.01149444 0.8425817 15.03917 18.26653
#> 2: 0 18 disp 307.15000 -1.49692222 23.1501247 261.18766 353.04416
#> 3: 1 14 mpg 24.55714 -0.03215714 1.3800432 21.86628 27.50551
#> 4: 1 14 disp 132.45714 0.32994286 14.9070552 104.45798 163.57344
由reprex 包(v0.3.0)于 2020 年 1 月 23 日创建