2

我的名为“locaddr”的输入文件有以下记录:

"Shelbourne Road, Dublin, Ireland"                                     
"1 Hatch Street Upper, Dublin, Ireland"                               
"98 Haddington Road, Dublin, Ireland"       
"11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland"
"Winterstraße 17, 69190 Walldorf, Germany"

我使用以下代码将 R 中的 STRSPLIT 函数应用于此文件:

*testmat <- strsplit(locaddr,split=",")
outmat <- matrix(unlist(testmat), nrow=nrow(locaddr), ncol=3, byrow=T)*

我得到的最终输出是:

Street                        City                    Country          
 [1,] "Shelbourne Road"             " Dublin"               " Ireland"       
 [2,] "1 Hatch Street Upper"        " Dublin"               " Ireland"       
 [3,] "98 Haddington Road"          " Dublin"               " Ireland"       
 [4,] "11 Mount Argus Close"        " Harold's Cross"       " Dublin 6W"     
 [5,] " Co. Dublin"                 " Ireland"              "Winterstraße 17"
 [6,] " 69190 Walldorf"             " Germany"              "Caughley Road"  
 [7,] " Broseley"                   " Shropshire TF12 5AT"  " UK"            
 [8,] "Pappelweg 30"                " 48499 Salzbergen"     " Germany"       
 [9,] "60 Grand Canal Street Upper" " Dublin 4"             " Ireland"       
[10,] "Wieslocher Straße"           " 68789 Sankt Leon-Rot" " Germany"

从上面可以明显看出,所需的输出是每条记录中的最后三个术语。但相反,我几乎混合了那里的所有东西。

我的要求是,虽然地址都是可变长度的,但在 STRSPLIT 之后,我只需要选择最后三个术语并将它们输入为 Street, City Country。

非常感谢您的帮助和时间。

4

2 回答 2

2

这基本上是罗马答案的变体,但旨在组合(可能)多个地址。它假设最后两个逗号分隔的值是城市和国家,然后汇集前面的元素。

# read data
y <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")
# split and output
result <- lapply(y, function(x) {
    splitx <- strsplit(x, ", ")[[1]]
    rowtail <- tail(splitx, n = 2)
    if(length(splitx)>3)
        multi <- paste(splitx[1:(length(splitx)-2)],collapse=", ")
    else
        multi <- splitx[1]
    return(c(multi,rowtail))
    })
# rbind back together
do.call(rbind,result)

这会产生:

     [,1]                                              [,2]             [,3]     
[1,] "Shelbourne Road"                                 "Dublin"         "Ireland"
[2,] "1 Hatch Street Upper"                            "Dublin"         "Ireland"
[3,] "98 Haddington Road"                              "Dublin"         "Ireland"
[4,] "11 Mount Argus Close, Harold's Cross, Dublin 6W" "Co. Dublin"     "Ireland"
[5,] "Winterstraße 17"                                 "69190 Walldorf" "Germany"
于 2013-07-08T11:53:01.420 回答
2

下次请为您的问题提供一些方便的可重现代码。

以下是我将如何尝试解决此问题的代码。

x <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")

# split on ,
splitx <- strsplit(x, ",")

# for every list element (lapply climbs the list element-wise)
# subset last 3 elements
last3 <- lapply(splitx, tail, n = 3)

# merge them together by row
do.call("rbind", last3)

     [,1]                   [,2]              [,3]      
[1,] "Shelbourne Road"      " Dublin"         " Ireland"
[2,] "1 Hatch Street Upper" " Dublin"         " Ireland"
[3,] "98 Haddington Road"   " Dublin"         " Ireland"
[4,] " Dublin 6W"           " Co. Dublin"     " Ireland"
[5,] "Winterstraße 17"      " 69190 Walldorf" " Germany"
于 2013-07-08T11:26:49.517 回答