Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我有一个从手写文档中扫描的句子向量。在此过程中存在一些间距问题,例如:
The d og is br own.
我很好奇是否有一种方法可以通用地采用任何带有'_x_'或空格字符空间的模式并像这样折叠第二个空格:
'_x_'
The d og is br own. --> The dog is br own.
我只担心空格之间的单个字符('_x_'NOT '_xx_')。
'_xx_'
有什么建议么?
也许
> x<-"The d og is br own." > gsub(" (.) "," \\1",x) [1] "The dog is br own."
或者
gsub(" ([[:alnum:]]) "," \\1",x)
(.)匹配任何内容([[:alnum:]])仅匹配字母数字字符。
(.)
([[:alnum:]])