1

我想将街道地址拆分为 r 中的街道名称和街道编号。

我的输入数据有一列,例如

    Street.Addresses

    205 Cape Road
    32 Albany Street 
    cnr Kempston/Durban Roads

我想将街道号码和街道名称分成两个单独的列,以便显示为:

    Street Number    Street Name
    205              Cape Road
    32               Albany Street
                     cnr Kempston/Durban Roads

无论如何都可以从R中的因子/字符串中的非数字条目中拆分数值吗?

谢谢

4

3 回答 3

3

你可以试试:

y <- lapply(strsplit(x, "(?<=\\d)\\b ", perl=T), function(x) if (length(x)<2) c("", x) else x)
y <- do.call(rbind, y)
colnames(y) <- c("Street Number", "Street Name")

hth

于 2014-04-10T12:33:50.300 回答
3

我确信有人会提供一个带有前瞻功能的酷正则表达式解决方案等等,但这可能对你有用:

X <- c("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads")
nonum <- grepl("^[^0-9]", X)
X[nonum] <- paste0(" \t", X[nonum])
X[!nonum] <- gsub("(^[0-9]+ )(.*)", "\\1\t\\2", X[!nonum])
read.delim(text = X, header = FALSE)
#    V1                        V2
# 1 205                 Cape Road
# 2  32             Albany Street
# 3  NA cnr Kempston/Durban Roads
于 2014-04-10T12:33:54.137 回答
1

这是另一种方式:

df <- data.frame (Street.Addresses = c ("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads"),
                 stringsAsFactors = F)

new_df <- data.frame ("Street.Number" = character(), 
                     "Street.Name" = character(), 
                     stringsAsFactors = F)
for (i in 1:nrow (df)) {

  new_df [i,"Street.Number"] <- unlist(strsplit (df[["Street.Addresses"]], " ")[i])[1]
  new_df [i,"Street.Name"] <- paste (unlist(strsplit (df[["Street.Addresses"]], " ")[i])[-1], collapse = " ")

}

> new_df
  Street.Number           Street.Name
1           205             Cape Road
2            32         Albany Street
3           cnr Kempston/Durban Roads
于 2014-04-10T13:26:22.537 回答