假设我有一堆类似的带有噪音的字符串,主要是错误连接/断开的单词。喜欢:
"Once more unto the breach, dear friends. Once more!"
"Once more unto the breach , dearfriends. Once more!"
"Once more unto the breach, de ar friends. Once more!"
"Once more unto the breach, dear friends. Once more!"
我怎样才能将每个人规范化为同一组单词?即
["once" "more" "unto" "the" "breach" "dear" "friends" "once" "more"]
谢谢!