1

我需要从文件中删除不是字母、小写或大写的所有内容,并将其替换为空格,例如:

The bear ate 3 snakes, then ate 50% of the fish from the river.

这变成:

The bear ate   snakes  then ate     of the fish from the river 
  • 有时文件包含不寻常的字符。它保存为 UTF-8。

如何用空格替换任何非字母?

4

4 回答 4

3

如果你想支持 unicode 字母(as mentioned in your question),那么这个 perl 命令替换所有unicode non-letters

echo $line | perl -pe 's/[^\p{L}\s]+/ /g;'

参考:http ://www.regular-expressions.info/unicode.html

于 2012-05-11T23:33:56.930 回答
2
$ echo "The bear ate 3 snakes, then ate 50% of the fish from the river." | sed "s/[^a-zA-Z]/ /g"
The bear ate   snakes  then ate     of the fish from the river 
于 2012-05-11T23:10:43.103 回答
2

这可能对您有用:

echo 'The bear ate 3 snakes, then ate 50% of the fish from the river.' | 
tr -c '[:alpha:]' ' '
The bear ate   snakes  then ate     of the fish from the river

或者:

echo 'The bear ate 3 snakes, then ate 50% of the fish from the river.' |
sed 's/[^[:alpha:]]/ /g'
The bear ate   snakes  then ate     of the fish from the river
于 2012-05-12T00:21:43.867 回答
1

尝试:

sed 's/[^A-Za-z]/ /g;' myfile.txt
于 2012-05-11T23:09:46.863 回答