r - 按列名的复杂模式子集数据框

Question

我有一个如下所示的数据集：

两轮数据（.t0和.t1）
多尺度（this和that）
每个量表有几个项目 ( 1, 22, 22a)
要忽略的几个变量 ( v2, v3, ignore.t0, ignore.t1, this.t0, this.t1, that.t0, that.t1)

.

dat <- data.frame(id = seq(from=1, to=10, by=1),
                  v2 = rnorm(10),
                  v3 = rnorm(10),
                  ignore.t0 = rnorm(10),
                  this.t0 = rnorm(10),
                  this1.t0 = rnorm(10),
                  this22.t0 = rnorm(10),
                  this22a.t0 = rnorm(10),
                  that.t0 = rnorm(10),
                  that1.t0 = rnorm(10),
                  that22.t0 = rnorm(10),
                  that22a.t0 = rnorm(10),
                  ignore.t1 = rnorm(10),
                  this.t1 = rnorm(10),
                  this1.t1 = rnorm(10),
                  this22.t1 = rnorm(10),
                  this22a.t1 = rnorm(10),
                  that.t1 = rnorm(10),
                  that1.t1 = rnorm(10),
                  that22.t1 = rnorm(10),
                  that22a.t1 = rnorm(10))

我想对数据框进行子集化以包含id并且仅包含以下内容的列：

比例名称（this或that）和
句点前的数字 ( 1.) 或数字和字母 ( 22a.)

所以最后，数据框应该是这样的：

dat2 <- data.frame(
                   id = seq(from=1, to=10, by=1),
                   #v2 = rnorm(10),
                   #v3 = rnorm(10),
                   #ignore.t0 = rnorm(10),
                   #this.t0 = rnorm(10),
                   this1.t0 = rnorm(10),
                   this22.t0 = rnorm(10),
                   this22a.t0 = rnorm(10),
                   #that.t0 = rnorm(10),
                   that1.t0 = rnorm(10),
                   that22.t0 = rnorm(10),
                   that22a.t0 = rnorm(10),
                   #ignore.t1 = rnorm(10),
                   #this.t1 = rnorm(10),
                   this1.t1 = rnorm(10),
                   this22.t1 = rnorm(10),
                   this22a.t1 = rnorm(10),
                   #that.t1 = rnorm(10),
                   that1.t1 = rnorm(10),
                   that22.t1 = rnorm(10),
                   that22a.t1 = rnorm(10))

数据框比此处表示的要大得多，因此无法键入列索引。也不可能只查找比例名称，因为this.t0, this.t1,that.t0和that.t1会被捕获。

# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)

如何修改 grep 模式以在句点之前查找数字或（数字和字符）？还是有更好的方法？

score 6 · Accepted Answer

这适用于您的示例：

dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]

在哪里：

\\d+是一位或多位数字
[a-z]?是零个或一个小写字母
\\.是为了点

如果您想为各种动态构建模式scales，您可以执行以下操作：

scales <- c("this", "that")
pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))
dat[c("id", grep(pattern, names(dat), value = TRUE))]

r - 按列名的复杂模式子集数据框

1 回答 1

Related

Reference