r - datasummary：将因子和数值变量组合在一个表中

Question

我正在尝试使用 modelsummary 创建一个包含因子和数值变量的表。我这样做的方法是将因子变量转换为数字，以便每个因子变量只出现 1 行，并且所有变量都出现在同一列中。然后，我将手动计算每个先前因子/现在数字变量的每个级别的单位数，并将其作为文本分配给我的数据集中的每个变量。我正在尝试按照N_alt以下示例中调用的函数执行此操作：

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}


# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt, data = tmp)

这给了我：

我的N_alt功能无法正常工作。class是正确的，但region不是。我没有收到任何警告信息。

我也试过：

N_alt = function(x) {
  if (x[1] %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x[1] %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

但我获得了相同的输出。我用这些向量创建了类似的函数，它们工作得很好，但是这个由于某种原因它不起作用。

此外，我还尝试过：

N_alt <- c('[32 (100)]','[14 (43.8); 18 (56.3)]','[14 (43.8); 6 (18.8); 12 (37.5)]','[32 (100)]')

和

N_alt <- c(rep('[32 (100)]',32),rep('[14 (43.8); 18 (56.3)]',32),rep('[14 (43.8); 6 (18.8); 12 (37.5)]',32),rep('[32 (100)]',32))

但我得到：

Error in datasummary(mpg + (`class [0,1]` = class) + (`region [A,B,C]` = region) +  : 
  Argument 'N_alt' is not length 32

有谁知道我在这里想念什么？

编辑：

似乎可以像下面Mean_alt那样运行函数，以便某些数字变量没有小数位（只是将它们转换为 as.integer 对我不起作用）并且以前的因子/现在的数字变量不显示任何平均值的结果在表中（两个不同的操作），如下所示：

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

Mean_alt = function(x) {
  if (x %in% c(tmp$mpg)) {
    as.character(floor(mean(x)), length=5)
  } else if (x %in% c(tmp$class, tmp$region)) {
    paste0("")
  } else {
    mean(x)
  }
}

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt + Heading("Mean") * Mean_alt, data = tmp)

输出：

score 1 · Accepted Answer

您正在克服三个限制。

第一个限制在 Base R：

如R手册if中所述， /中的语句else必须计算为单个TRUEor FALSE。在内部，datasummary将N_alt一个接一个地应用于每个变量。每次，N_alt都会收到一个长度为 32 的新向量。坦率地说，我认为检查该向量的第一个元素的值没有多大意义；我不明白这怎么能把我们带到我们想去的地方。

另外两个限制与tables包的基本设计有关，modelsummary::datasummary基于：

因子将始终在每个因子级别生成一行。
我不认为有一个好方法可以告诉datasummary一个函数在应用于不同的数值变量时应该表现不同。这是因为每个函数只看到原始数字向量，而不是其他元信息。

我认为最简单的解决方法是创建两张表，一张用于您的因子，一张用于您的数字。然后，这些表可以很容易地组合起来：

library(modelsummary)

N_factor <- function(x) {
  count <- table(x)
  pct <- prop.table(count)
  out <- paste(sprintf("%.0f (%.1f)", count, pct), collapse = "; ")
  sprintf("[%s]", out)
}

N_numeric <- function(x) {
  sprintf("%s (100)", length(x))
}

tab_fac <- datasummary(cyl + gear ~ Heading("N") * N_factor, 
                       output = "data.frame",
                       data = mtcars)

datasummary(mpg + hp ~ Heading("N") * N_numeric, 
            add_rows = tab_fac,
            data = mtcars)

	ñ
mpg	32 (100)
生命值	32 (100)
圆柱体	[11 (0.3); 7 (0.2); 14 (0.4)]
齿轮	[15 (0.5); 12 (0.4); 5 (0.2)]

r - datasummary：将因子和数值变量组合在一个表中

1 回答 1

Related

Reference