0

我有 File1.csv 包含 3000 条记录,我需要从中删除与地址无关的字符。

每条记录都从“&”或“A/O”开始。我需要清理我的“Address1”字段,如果字段中没有地址相关信息,我需要有空记录。

例子:

File1.csv:

Address1
&&2340 Clemb Street
&&564 7th Street
&&&10th Street
A/O11th Street
A/ONorth Street
A/O/OSouth Street
A/Ocareof
A/Otttt
A/Oyuyuyu
A/Ouiuiuiuiui
A/O/yuyyuyuyuyugggh 4510th Street
&uhhhhhello 56 11th Street

我期待 File1 的结果 - 没有 A/O、A/O/O、A/Ouiuiuiui 等:

文件 1.csv:

Address1
2340 Clemb Street
564 7th Street
10th Street
11th Street
North Street
South Street
<blank record>
<blank record>
<blank record>
<blank record>
4510th Street
56 11th Street

感谢您的帮助!

4

1 回答 1

1

There are almost certainly fancier matching patterns you could use, but gsub() and the following seem to get the job done with this dataset:

x <- c('&&2340 Clemb Street',
       '&&564 7th Street',
       '&&&10th Street',
       'A/O11th Street',
       'A/ONorth Street',
       'A/O/OSouth Street')

gsub("&|A/O|/O", "", x)
#-----
[1] "2340 Clemb Street" "564 7th Street"    "10th Street"       "11th Street"      
[5] "North Street"      "South Street"  

Intro to regex can be found here.

于 2012-11-01T23:51:19.470 回答