r - 在R中的特定字符串之前提取整数（不同长度）

Question

我正在尝试从已知字符串之前的数据框列（$description）中提取不同长度的整数。例如，我希望提取出现在以下字符串“yard”之前的整数（每行代表数据框列中的一个单独条目）：

(3:18) B.Green-Ellis 左端到 NE 28 为 -1 码 (A.Ross)。

(1:07) (No Huddle Shotgun) B.Green-Ellis 右后卫到 NYG 27 4 码 (C.Blackburn)。

(14:00) B.Green-Ellis 右端在 NYG 33 处将 ob 推了 17 码 (K.Phillips)。

问题在于整数的长度可以变化（即 4 或 17），但它也可以是负数。

我真的已经尝试了我能想到的一切，并且整天都在寻找相关的线程！

score 3 · Accepted Answer

您可以使用一些简单的正则表达式gsub并从字符串的末尾开始工作：

temp <- c("(3:18) B.Green-Ellis left end to NE 28 for -1 yards (A.Ross).", 
          "(1:07) (No Huddle Shotgun) B.Green-Ellis right guard to NYG 27 for 4 yards (C.Blackburn).", 
          "(14:00) B.Green-Ellis right end pushed ob at NYG 33 for 17 yards (K.Phillips).")
as.numeric(gsub("^(.*)( [-1-9]+)(.*)$", "\\2", temp))
# [1] -1  4 17

查看正则表达式：

^.*-- 匹配任何东西......直到......
...它进入一个空格，后跟任意数量的数字[-0-9]+，有些可能-在它们之前有 a，然后是...
...任何东西.*$，直到输入结束。

括号用于“反向引用”。您会注意到在上面的示例中有三个组，我们只对第二组的结果感兴趣，因此\\2我们将其替换为。

score 2 · Accepted Answer

A very simple solution would be

s1 <- "(3:18) B.Green-Ellis left end to NE 28 for -1 yards (A.Ross)."
ss1 <- strsplit(s1, split = " ")[[1]]
as.numeric(ss1[grep("yards", ss1) -1])

now you just have to put this in a loop and apply to every row, i.e

s1 <- "(3:18) B.Green-Ellis left end to NE 28 for -1 yards (A.Ross)."
s2 <- "(1:07) (No Huddle Shotgun) B.Green-Ellis right guard to NYG 27 for 4 yards  (C.Blackburn)."
s3 <- "(14:00) B.Green-Ellis right end pushed ob at NYG 33 for 17 yards (K.Phillips)."

df <- rbind(s1,s2,s3)

splits <- strsplit(df[, 1], split = " ")
sapply(splits, function(z) z[grep("yards", z) - 1])

you can also do this in one step, as @joshua suggested!

score 2 · Accepted Answer

有点复杂......但对我来说假设数字前有一个空格......我无法获得正则表达式来提取数字本身......

# the data...
yards <- c("(3:18) B.Green-Ellis left end to NE 28 for -1 yards (A.Ross).", 
"(1:07) (No Huddle Shotgun) B.Green-Ellis right guard to NYG 27 for 4 yards (C.Blackburn).", 
"(14:00) B.Green-Ellis right end pushed ob at NYG 33 for 17 yards (K.Phillips).")

# handy function from http://r.789695.n4.nabble.com/reverse-string-td2288532.html
strReverse <- function(x) sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
# remove everything after ' yard'
y1 <- gsub(' *yard.*$', '', yard)
# reverse and remove everything after the space and reverse again
as.numeric(strReverse(gsub(' .*$','', strReverse(y1))))

r - 在R中的特定字符串之前提取整数（不同长度）

3 回答 3

Related

Reference