假设我有以下数据集,其中列的结构如下。
df1 = data.frame(Date=c(rnorm(5)),
"United States) New York (NY" = c(rnorm(5)),
"United States) Chicago (Illinois" = c(rnorm(5)),
"United States) Denver (Colorado" = c(rnorm(5)),
"United States) Seattle (Washington" = c(rnorm(5)),
"United States) Minneapolis (Minnesota" = c(rnorm(5)), check.names=FALSE)
df1
df2 = data.frame(Date=c(rnorm(5)),
"New York (New York, United States)" = c(rnorm(5)),
"Phoenix (Arizona, United States)" = c(rnorm(5)),
"Chicago (Illinois, United States)" = c(rnorm(5)),
"Los Angeles (California, United States)" = c(rnorm(5)), check.names=FALSE)
df2
如您所见,每一列都代表一个城市,但列名的结构是不可管理的。我想知道是否有人可以帮助我弄清楚如何从列名字符串中提取城市名称。
我可以拥有每个城市的字典并进行字符串匹配,但我运气不佳。我还假设有一种方法可以使用 str_split 来做到这一点,但我还没有弄清楚。
sapply(str_split(names(df1),")"), 2)
当然,我确信也有一个 gsub 解决方案,但是在正则表达式方面我有点无能。
最终,我只想将实际的城市名称作为列名。
New York, Chicago, Denver, Seattle, Minneapolis