0

I'm attempting to use formula to generate a model.matrix object to be used in a custom optimizer function.

It works great for the most part, but when it comes to factor-factor interactions, I'd like to specify the interaction as dummy coded rather than effects coded.

Take for example the following data set:

set.seed(1987)
myDF <- data.frame(Y = rnorm(100),
               X1 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]),
               X2 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]))

head(myDF)

Both the : and / operators create an effects coded model matrix (the latter being an additive effects structure, I think).

head(model.matrix(formula(Y ~ X1 : X2), data = myDF))
head(model.matrix(formula(Y ~ X1 / X2), data = myDF))

But I am looking to generate a dummy coded model matrix, which would have the first level of X1 omitted for each level of X2. Resulting in these terms (columns):

X1B:X2A

X1C:X2A

X1B:X2B

X1C:X2B

X1B:X2C

X1C:X2C

Is there a way to achieve this?

4

2 回答 2

2

~X1:X2-1你要找的吗?

制作测试数据(如上):

set.seed(1987)
myDF <- data.frame(Y = rnorm(100),
          X1 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]),
          X2 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]))

生成模型矩阵:

mm1 <- model.matrix(formula(Y ~ X1 : X2 - 1), data = myDF)
head(mm1)
##   X1A:X2A X1B:X2A X1C:X2A X1A:X2B X1B:X2B X1C:X2B X1A:X2C X1B:X2C X1C:X2C
## 1       0       0       0       0       1       0       0       0       0
## 2       1       0       0       0       0       0       0       0       0
## 3       0       0       0       0       0       0       0       1       0
## 4       0       0       0       0       0       1       0       0       0
## 5       0       0       0       1       0       0       0       0       0
## 6       0       0       0       0       0       0       1       0       0

或者,也许您真的只想排除一些列:

mm0 <- model.matrix(formula(Y ~ X1 : X2), data = myDF)
mm0B <- mm0[,!grepl("(Intercept|^X1A:)",colnames(mm0))]
##   X1B:X2A X1C:X2A X1B:X2B X1C:X2B X1B:X2C X1C:X2C
## 1       0       0       1       0       0       0
## 2       0       0       0       0       0       0
## 3       0       0       0       0       1       0
## 4       0       0       0       1       0       0
## 5       0       0       0       0       0       0
## 6       0       0       0       0       0       0

我想你也可能对总和为零的对比感兴趣:

 mm2 <- model.matrix(formula(Y ~ X1 : X2 - 1), data = myDF,
                     contrasts.arg=list(X1=contr.sum,X2=contr.sum))
于 2015-04-20T20:42:25.227 回答
0

下面是另一个试验。

set.seed(1987)
myDF <- data.frame(Y = rnorm(100),
                   X1 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]),
                   X2 = factor(LETTERS[sample(1:3, 100, replace = TRUE)]))
# row subsetting to exclude A
modelMat <- model.matrix(formula(Y ~ X1 : X2), data = myDF[myDF$X1 != 'A',])
# column subsetting to eliminate all columns including X1A
modelMat <- modelMat[,substring(colnames(modelMat), 1, 3) != "X1A"]
head(modelMat)
   (Intercept) X1B:X2A X1C:X2A X1B:X2B X1C:X2B X1B:X2C X1C:X2C
1            1       0       0       1       0       0       0
3            1       0       0       0       0       1       0
4            1       0       0       0       1       0       0
8            1       0       0       0       0       1       0
10           1       0       0       0       0       0       1
11           1       0       0       0       0       0       1
于 2015-04-20T20:53:22.767 回答