2

我需要从包含匹配的文件中删除所有行 for read (symbol),其中(symbol)是任何 CJK 字符。在匹配项中read (symbol),前面紧跟 AZ 或 az,但是,不应删除该行。例如,这里有一些示例行和结果:

Do you like to read books? (not deleted)
Can you read 书? ( deleted)
.read 书. (deleted)
This is some thread 线. (not deleted)

我怎样才能只删除那些匹配的行(not A-Z or a-z)read (CJK symbol)

4

2 回答 2

1
awk '$0~/ read [a-zA-Z]+/' your_file
于 2012-09-13T13:15:57.593 回答
1

我不完全确定如何匹配 CJK 字符,但如果您匹配非 ASCII 字符,您可能会获得您正在寻找的结果:

grep -vP "[^A-Za-z]read [\x80-\xFF]" file.txt

从理论上讲,您应该能够:

grep -vP "[^A-Za-z]read [\x{2E80}-\x{9FBB}]+" file.txt

但是在我的测试中,我得到了错误:

grep: character value in \x{...} sequence is too large

http://en.wikipedia.org/wiki/List_of_Unicode_characters#CJK_unified_ideographs

编辑:

LC_ALL="POSIX" sed -r '/[^A-Za-z]read [\o200-\o377]+/d' file.txt

结果:

Do you like to read books? (not deleted)
This is some thread 线. (not deleted)

另见:

如何删除紧跟在特定符号之后的所有 CJK 文本?

于 2012-09-13T13:23:20.957 回答