1

我的数据集中有两个名称相似的变量:“JE.Description”和“Field.Description”。如何定位“JE.Description”列的列索引,以便从 RegExp 搜索中排除“字段”一词?换句话说,我想修改下面的命令只返回“JE.Description”的列索引:

数据集经常更新,有时“JE.Description”字符串显示为“Description”。这就是为什么我正在寻求一种解决方案来明确排除关键字“字段”。

r1 <- c(1:5)
r2 <- c(1:5)
df <- data.frame(r1,r2)
names(df)[1] <- "JE.Description"
names(df)[2] <- "Field.Description"

y <- grep("!^Field^Description",perl = TRUE, colnames(df))
RETURNS: integer[0]

谢谢,

4

2 回答 2

6

要匹配包含的每个字符串,"Description" 除了前面紧跟 的字符串之外"Field.",请使用否定的lookbehind 断言:

## The regex pattern
pat <- "(?<!Field\\.)Description"

## Try it out
x <- c("Description", "Field.Description", "FieldDescription", "xyz Description")
grep(pat, x, perl=TRUE)  # Note: lookahead & lookbehind assertions need perl=TRUE
# [1] 1 3 4

或者,如果子字符串"field"可能出现在相对于 , 的其他位置"Description"(并且可能在大写或小写版本中),则只需grepl()两次并使用布尔运算符组合结果可能会更简单:

x <- c("Description", "fieldDescription", "Field-of-Description", 
       "Description field")
which(grepl("Description", x) & !grepl("field", x, ignore.case=TRUE))
[1] 1
于 2013-10-14T20:12:22.393 回答
0
mydata<-structure(list(Description = c(21, 21, 22.8, 21.4, 18.7, 18.1, 
14.3, 24.4, 22.8, 19.2), Field.Description = c(6, 6, 4, 6, 8, 
6, 8, 4, 4, 6)), .Names = c("Description", "Field.Description"
), row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", 
"Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360", 
"Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

mydata[grep("^Description",names(mydata))]
                  Description
Mazda RX4                21.0
Mazda RX4 Wag            21.0
Datsun 710               22.8
Hornet 4 Drive           21.4
Hornet Sportabout        18.7
Valiant                  18.1
Duster 360               14.3
Merc 240D                24.4
Merc 230                 22.8
Merc 280                 19.2
于 2013-10-14T20:18:43.380 回答