我有一个如下所示的数据集:
- 两轮数据(
.t0和.t1) - 多尺度(
this和that) - 每个量表有几个项目 (
1,22,22a) - 要忽略的几个变量 (
v2,v3,ignore.t0,ignore.t1,this.t0,this.t1,that.t0,that.t1)
.
dat <- data.frame(id = seq(from=1, to=10, by=1),
v2 = rnorm(10),
v3 = rnorm(10),
ignore.t0 = rnorm(10),
this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
ignore.t1 = rnorm(10),
this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
我想对数据框进行子集化以包含id并且仅包含以下内容的列:
- 比例名称(
this或that)和 - 句点前的数字 (
1.) 或数字和字母 (22a.)
所以最后,数据框应该是这样的:
dat2 <- data.frame(
id = seq(from=1, to=10, by=1),
#v2 = rnorm(10),
#v3 = rnorm(10),
#ignore.t0 = rnorm(10),
#this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
#that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
#ignore.t1 = rnorm(10),
#this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
#that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
数据框比此处表示的要大得多,因此无法键入列索引。也不可能只查找比例名称,因为this.t0, this.t1,that.t0和that.t1会被捕获。
# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)
如何修改 grep 模式以在句点之前查找数字或(数字和字符)?还是有更好的方法?