3

I am trying to take a character vector of dollar values that is poorly formatted and turn it into a numeric. The values are formatted as in the following vector, with a leading and trailing space, a comma, and a dollar sign:

x <- c(" 18,000.50 ", " $1,240.30 ", " $125.00 ")

I am trying to use the following function to get rid of all characters other than the digits and the dot, but it isn't working:

trim_currency <- function(x) grep("\$([0-9.]*)\,([0-9.]*)", x, values=TRUE)

I got the regex code

\$([0-9.]*)\,([0-9.]*)

to run successfully with this regex tester http://regex101.com/r/qM2uG0

When I run it in R, I get the following error:

Error: '\$' is an unrecognized escape in character string starting "\$"

Any ideas about how I can do this in R?


Thanks to ndoogan for his response. That solves this particular issue. However, if I wanted to make it more general, I would ask:

How could I use R/regex to run a vector through a filter, allowing only the digits and periods to come through?

4

1 回答 1

7
x <- c(" 18,000.50 ", " $1,240.30 ", " $125.00 ")
gsub("[,$ ]","",x)
#[1] "18000.50" "1240.30"  "125.00"

在括号内添加更多字符以消除不同的内容。我假设这个例子x在这里很详尽。

更新

如果你知道你只对数字和小数点感兴趣,那么你可以这样做:

gsub("[^0-9.]","",x)
#[1] "18000.50" "1240.30"  "125.00"

^括号内的内容否定了方括号中语句的含义。

最后,要将结果值转换为数字形式,请将gsub()函数(或包含其输出的对象)包装在as.numeric()

as.numeric(gsub("[^0-9.]","",x))
#[1] 18000.5  1240.3   125.0
于 2013-05-03T22:13:04.317 回答