r - fixst vs lm - 不同的结果？（差异差异）

Question

我正在尝试在多个时间段的差异上做一个“经典”的差异。我想做的模型是：

y = a + b1x1 + b2_treat + b3_period + b_4(treat*period) + u (eq.1)

所以基本上我正在测试不同的设置，只是为了确保我以正确的方式指定我的模型，使用不同的包。我想使用 fixst-package，所以我尝试将估计值与标准 lm()-package 的估计值进行比较。然而，结果不同——系数和标准错误。

我的问题是：

lm_mod、lm_mod2 或 feols_mod 回归是否正确指定（如 eq.1 中所示）？

如果没有，如果有人能告诉我如何在 lm() 和 feols() 中获得相同的结果，我将不胜感激！

# libraries
library(fixest)
library(modelsummary)
library(tidyverse)

# load data
data(base_did)

# make df for lm_mod with 5 as the reference-period
base_ref_5 <- base_did %>% 
  mutate(period = as.factor(period)) %>% 
  mutate(period = relevel(period, ref = 5))

# Notice that i use base_ref_5 for the lm model and base_did for the feol_mod.
lm_mod <- lm(y ~ x1 + treat*period, base_ref_5)
lm_mod2 <- lm(y ~ x1 + treat + period + treat*period, base_ref_5)
feols_mod <- feols(y ~ x1 + i(period, treat, ref = 5), base_did)

# compare models
models <- list("lm" = lm_mod, 
               "lm2" = lm_mod2,
               "feols" = feols_mod)

msummary(models, stars = T)

**EDIT:** 
the reason why I created base_ref_5 was so that both regressions would have period 5 as the reference period, if that was unclear.

**EDIT 2**: 
added a third model (lm_mod2) which is much closer, but there is still a difference.

score 2 · Accepted Answer

这里有两个问题。

在lm()模型中，period变量是交互的，但被视为连续数值变量。相反，调用i(period, treat)将period其视为一个因素（文档中对此进行了清楚的解释）。
该i()函数仅包括交互作用，而不包括构成项。

这里有两个模型来说明相似之处：

library(fixest)

data(base_did)

lm_mod <- lm(y ~ x1 + factor(period) * factor(treat), base_did)

feols_mod <- feols(y ~ x1 + factor(period) + i(period, treat), base_did)

coef(lm_mod)["x1"]

#>        x1 
#> 0.9799697

coef(feols_mod)["x1"]
#>        x1 
#> 0.9799697

请注意，我只回答了您关于和之间的相似性的部分lm问题feols。StackOverflow 是一个编程问答网站。如果您对统计模型的正确规范有疑问，您可能想在 CrossValidated 上提问。

r - fixst vs lm - 不同的结果？（差异差异）

1 回答 1

Related

Reference