bash - 如何删除紧跟在特定符号之后的所有 CJK 文本？

Question

我有一些这样的文字：

This is some text Z书. This is Zsome more text Z计算机.
This is yet some more Z电脑 text Z.

我需要删除与模式匹配的所有案例Z+(CJK)，其中(CJK)是任意数量的连续 CJK 字符。上面的文件将变为：

This is some text . This is Zsome more text .
This is yet some more  text Z.

如何删除与此模式匹配的所有 CJK 文本？

score 2 · Accepted Answer

2

Perl 单线如何？

perl -CSD -pe 's/Z\p{InCJK_Unified_Ideographs}+//g;' inputfile

于 2012-09-16T08:23:09.183 回答

score 2 · Accepted Answer

您可以使用GNU sed检查非 ASCII 字符的代码：

sed -n l0 file.txt

结果：

This is some text Z\344\271\246. This is Zsome more text Z\350\256\241\347\256\227\346\234\272.$
This is yet some more Z\347\224\265\350\204\221 text Z.$

然后你可以用GNU sed它来做你想要的替换。在我的测试中，我必须将我的语言环境设置为 POSIX：

LC_ALL="POSIX" sed -r 's/Z[\o200-\o377]+//g' file.txt

结果：

This is some text . This is Zsome more text .
This is yet some more  text Z.

bash - 如何删除紧跟在特定符号之后的所有 CJK 文本？

2 回答 2

Related

Reference