1

我对 R 很陌生,我想知道如何从这种类型的字符串中提取距离并输入:“刚刚用 @RunKeeper 完成了 0.56 英里的步行”。所以我想将“0.56”、“mi”和“walk”存储到三个单独的变量中。我该怎么做?

谢谢!杰罗恩。

我试过这个:

can.be <- function(object, class="numeric") 
  suppressWarnings(!is.na(as(object, class)))

str.vec <- c(text)

str.vec <- strsplit(str.vec, " ")

strsplit(str.vec, " ") 中的错误:非字符参数

pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))
[[1]]

0.56 4

[[2]] 命名整数(0)

... mapply( [[, str.vec, pos) mapply( [[, str.vec, pos+1) mapply( [[, str.vec, pos+2)

但现在我得到这个错误:

> mapply(`[[`, str.vec, pos)
Error in .Primitive("[[")(dots[[1L]][[2L]], dots[[2L]][[2L]]) : 
  attempt to select less than one element
> mapply(`[[`, str.vec, pos+1)
Error in pos + 1 : non-numeric argument to binary operator
> mapply(`[[`, str.vec, pos+2)
Error in pos + 2 : non-numeric argument to binary operator

示例数据(文本):

Just completed a 0.56 mi walk with @RunKeeper. Check it out! http://t.co/lCyzzFeSwq #RunKeeper
Just completed a run in 0:00  with @RunKeeper. Check it out! http://t.co/dJB9DBwF4o #RunKeeper
Just completed a 1.83 km run with @RunKeeper. Check it out! http://t.co/f0S2aKnWXz #RunKeeper
Just completed a 6.03 km run - Gettin' it done! http://t.co/uQ7rBn6M #RunKeeper
Just completed a 1.81 mi walk with @RunKeeper. Check it out! http://t.co/R70fvkLDES #RunKeeper
4

2 回答 2

2

如果预计它们将按该特定顺序排列,则

can.be <- function(object, class="numeric") 
  suppressWarnings(!is.na(as(object, class)))

str <- strsplit("Just completed a 0.56 mi walk with @RunKeeper", " ")[[1]]

pos <- which(sapply(str, can.be))

> str[pos]
[1] "0.56"
> str[pos+1]
[1] "mi"
> str[pos+2]
[1] "walk"

它需要序列始终相同。但是您可以硬编码一系列测量单位(如mi,km等)以将它们识别到序列中(即使您总是有numbermi的可能性更大。前提是字符串中没有其他数字,这种方法应该是相当健壮的。

编辑:

对于向量:

str.vec <- c("Just completed a 0.56 mi walk with @RunKeeper", "Just completed a 13 mi cycling with @Michele")

str.vec <- strsplit(str.vec, " ")

pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))

> mapply(`[[`, str.vec, pos)
[1] "0.56" "13"  
> mapply(`[[`, str.vec, pos+1)
[1] "mi" "mi"
> mapply(`[[`, str.vec, pos+2)
[1] "walk"    "cycling"
于 2013-06-26T09:12:04.777 回答
0

如果字符串始终具有相同的格式,您可以使用:

dist<-as.numeric(substr(text,18,21))
unit<-substr(text,22,23)
way<-substr(text,25,28)

但如果不这样做,它将不起作用,例如,如果数字的长度发生变化(例如从 0.56 到 12.21)。你必须确保它不会发生!

于 2013-06-26T09:06:15.483 回答