所以我有一个data.frame,我想把它变成一个model.matrix。这一切都在 R 中
这是我所拥有的一个例子:
col1
1 "factor1","factor2"
2 "factor1"
3 "factor3"
4 "factor1","factor2"
我想创建以下输出:
factor1 factor2 factor3
1 1 1 0
2 1 0 0
3 0 0 1
4 1 1 0
我将不胜感激任何建议!我一直在使用 sparse.model.matrix 无济于事,因为它为每个列表创建了唯一的因子列,而不是将它们识别为相似因子的列表。
这是数据开头的 dput() (它要大得多):
dd = structure(list(id = c("rs62224609", "", "", "", "rs62224609", "", "", "", "",
"", "", "", "", "", "", "", "rs587626763", "", "", "", "rs62224609", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "rs62224609,rs587626763", "", "", "", "", "", "",
"", "", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", ""), library = structure(c(4L,
4L, 4L, 5L, 5L, 5L, 4L, 4L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 3L, 5L, 2L, 4L, 4L, 2L, 2L, 3L, 1L, 4L, 2L, 2L, 5L, 2L, 2L,
2L, 4L, 3L, 3L, 4L, 5L, 3L, 4L, 3L, 4L, 5L, 4L, 5L, 2L, 5L, 5L,
2L, 2L, 4L, 3L, 5L, 3L, 5L, 5L, 4L, 1L, 5L, 2L, 3L, 5L, 5L, 1L,
4L, 1L, 2L, 4L, 5L, 1L, 3L, 4L, 4L, 2L, 1L, 4L, 2L, 5L, 5L, 1L,
5L, 2L, 3L, 3L, 1L, 1L, 3L, 5L, 4L, 5L, 5L, 5L, 4L, 2L, 1L, 3L,
3L, 2L, 1L), .Label = c("42", "43", "44", "45_1", "45_2"), class = "factor")), .Names = c("id",
"library"), row.names = c(NA, 100L), class = "data.frame")
head(dd, 8)
# id library
# 1 rs62224609 45_1
# 2 45_1
# 3 45_1
# 4 45_2
# 5 rs62224609 45_2
# 6 45_2
# 7 45_1
# 8 45_1
期望的输出
rs62224609
1 1
2 0
3 0
4 0
5 1
6 0
7 0
8 0