0

我试图从一个句子中提取数字,然后将这些数字放在一起作为一个数字数组。例如,

  string<-"  The Team:  $74,810 TOTAL RAISED SO FARJOIN THE TEAM Vik Muniz 
             Amount Raised: $70,560   71% Raised of $100,000 Goal CDI International,
             Inc.  Amount Raised: $2,070  Robert Goodwin Amount Raised: $1,500 
             30% Raised of $5,000 Goal Marcel Fukayama Amount Raised: 
             $210  Maitê Proença Amount Raised: $140  
             Thiago Nascimento Amount Raised: $120  
             Lydia Kroeger Amount Raised: $80  "          

为了继续,我首先删除了逗号,以便可以轻松提取数字:

    string.nocomma <- gsub(',', '', string)

然后我试图把这些数字放在一起作为一个数字向量:

    fund.numbers <-unique(as.numeric(gsub("[^0-9]"," ",string.nocomma),""))       

以下是问题:

  1. R 在最后一个命令之后抛出错误。错误如下:

    Warning message:
    In unique(as.numeric(gsub("[^0-9]", " ", website.fund.nocomma),  :
    NAs introduced by coercion
    
  2. 即使我修复了上述错误并拥有数值向量,我也不确定如何将数值向量转换为数值数组。

    有人能帮我吗?谢谢,

4

2 回答 2

2

你可以这样做:

## Extract all numbers and commas
numbers <- unlist(regmatches(string, gregexpr("[0-9,]+", string)))
## Delete commas
numbers <- gsub(",", "", numbers)
## Delete empty strings (when only one comma has been extracted)
numbers <- numbers[numbers != ""]
numbers

# [1] "74810"  "70560"  "71"     "100000" "2070"   "1500"   "30"    
# [8] "5000"   "210"    "140"    "120"    "80"
于 2013-10-17T19:17:22.007 回答
1

应用 gsub() 后,您会得到一个带有数字和空格的字符串,因此无法直接将其转换为数字。你需要一个数字向量。我认为最好使用gregexpr它来获取它:

## get list of string with numbers only
> res = regmatches(string.nocomma, gregexpr("([0-9]+)", string.nocomma))
## convert it to numeric
> res = as.numeric(unlist(res))

 [1]  74810  70560     71 100000   2070   1500     30   5000    210    140    120
[12]     80
于 2013-10-17T19:17:40.873 回答