我有一个如下所示的数据集:
- 两轮数据(
.t0
和.t1
) - 多尺度(
this
和that
) - 每个量表有几个项目 (
1
,22
,22a
) - 要忽略的几个变量 (
v2
,v3
,ignore.t0
,ignore.t1
,this.t0
,this.t1
,that.t0
,that.t1
)
.
dat <- data.frame(id = seq(from=1, to=10, by=1),
v2 = rnorm(10),
v3 = rnorm(10),
ignore.t0 = rnorm(10),
this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
ignore.t1 = rnorm(10),
this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
我想对数据框进行子集化以包含id
并且仅包含以下内容的列:
- 比例名称(
this
或that
)和 - 句点前的数字 (
1.
) 或数字和字母 (22a.
)
所以最后,数据框应该是这样的:
dat2 <- data.frame(
id = seq(from=1, to=10, by=1),
#v2 = rnorm(10),
#v3 = rnorm(10),
#ignore.t0 = rnorm(10),
#this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
#that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
#ignore.t1 = rnorm(10),
#this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
#that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
数据框比此处表示的要大得多,因此无法键入列索引。也不可能只查找比例名称,因为this.t0
, this.t1
,that.t0
和that.t1
会被捕获。
# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)
如何修改 grep 模式以在句点之前查找数字或(数字和字符)?还是有更好的方法?