r - 如何重复替换R中变量中的子字符串

Question

我有以下任务

Treatment$V010 <- as.numeric(substr(Treatment$V010,1,2))
Treatment$V020 <- as.numeric(substr(Treatment$V020,1,2))
[...]
Treatment$V1000 <- as.numeric(substr(Treatment$V1000,1,2))

我有 100 个变量，从 $V010、$V020、$V030... 到 $V1000。这些是不同长度的数字。我想只“提取”数字的前两位数字，并用两位长的新数字替换旧数字。

我的数据框“Treatment”还有 80 个变量，我在这里没有提到，所以我的目标是这个函数只应用于提到的 100 个变量。

我怎样才能做到这一点？我可以编写该命令 100 次，但我确信有更好的解决方案。

score 3 · Accepted Answer

好吧，让我们做吧。首先要做的事情：当您想要获取数据框的特定列时，您需要指定它们的名称来访问它们：

cnames = paste0('V',formatC(seq(10,1000,by=10), width = 3, format = "d", flag = "0"))

(cnames是一个包含的向量c('V010','V020', ..., 'V1000'))

接下来，我们将获取它们的索引：

coli=unlist(sapply(cnames, function (x) which(colnames(Treatment)==x)))

（coli是一个包含Treatment相关列索引的向量）

最后，我们将在这些列上应用您的函数：

Treatment[coli] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[coli])

它有效吗？

PS：如果有人有更好/更简洁的方法，请告诉我:)

编辑：

中间步骤没有用，因为您已经可以使用列名cnames来获取相关列，即

Treatment[cnames] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[cnames])

（进行从列名到列索引的转换的唯一优点是当数据框中缺少一些列时 - 在这种情况下，Treatment['non existing column']崩溃与undefined columns selected）

score 1 · Accepted Answer

根据可以用正则表达式描述的模式选择相关列的解决方案。

正则表达式解释：：
^字符串开头
V：文字 V
\\d{2}：正好 2 位数字

Treatment <- data.frame(V010 = c(120, 130), x010 = c(120, 130), xV1000 = c(111, 222), V1000 = c(111, 222))
Treatment
#   V010 x010 xV1000 V1000
# 1  120  120    111   111
# 2  130  130    222   222

# columns with a name that matches the pattern (logical vector)
idx <- grepl(x = names(Treatment), pattern = "^V\\d{2}")

# substr the relevant columns
Treatment[ , idx] <- sapply(Treatment[ , idx], FUN = function(x){
  as.numeric(substr(x, 1, 2))
  })

Treatment
#   V010 x010 xV1000 V1000
# 1   12  120    111    11
# 2   13  130    222    22

r - 如何重复替换R中变量中的子字符串

2 回答 2

Related

Reference