16

I am a beginner with R. Now, I have a vector in a data.frame like this

city
Kirkland,
Bethesda,
Wellington,
La Jolla,
Berkeley,
Costa, Evie KW172NJ
Miami,
Plano,
Sacramento,
Middletown,
Webster,
Houston,
Denver,
Kirkland,
Pinecrest,
Tarzana,
Boulder,
Westfield,
Fair Haven,
Royal Palm Beach, Fl
Westport,
Encino,
Oak Ridge,

I want to clean it. What I want is all the city names before the comma. How can I get the result in R? Thanks!

4

5 回答 5

21

您可以使用gsub一些 regexp :

cities <- gsub("^(.*?),.*", "\\1", df$city)

这个也有效:

cities <- gsub(",.*$", "", df$city)
于 2013-10-11T14:52:07.160 回答
4

只是为了好玩,您可以使用strsplit

> x <- c("London, UK", "Paris, France", "New York, USA")
> sapply(strsplit(x, ","), "[", 1)
[1] "London"   "Paris"    "New York"
于 2013-10-11T15:01:53.117 回答
4

您可以使用regexpr来查找每个元素中第一个逗号的位置,并用于substr在此剪断它们:

x <- c("London, UK", "Paris, France", "New York, USA")

substr(x,1,regexpr(",",x)-1)
[1] "London"   "Paris"    "New York"
于 2013-10-11T14:59:41.770 回答
2

这也有效:

x <- c("London, UK", "Paris, France", "New York, USA")

library(qdap)
beg2char(x, ",")

## > beg2char(x, ",")
## [1] "London"   "Paris"    "New York"
于 2013-10-11T15:54:20.870 回答
2

如果这是数据框中的一列,我们可以使用 tidyverse。

library(dplyr)
x <- c("London, UK", "Paris, France", "New York, USA")
x <- as.data.frame(x)
x %>% separate(x, c("A","B"), sep = ',')
        A       B
1   London      UK
2    Paris  France
3 New York     USA
于 2019-01-14T03:24:35.080 回答