bash - 如何删除包含特定字符串的所有行，但仅当后面的字符是 CJK 字符时？

Question

我需要从包含匹配的文件中删除所有行 for read (symbol)，其中(symbol)是任何 CJK 字符。在匹配项中read (symbol)，前面紧跟 AZ 或 az，但是，不应删除该行。例如，这里有一些示例行和结果：

Do you like to read books? (not deleted)
Can you read 书? ( deleted)
.read 书. (deleted)
This is some thread 线. (not deleted)

我怎样才能只删除那些匹配的行(not A-Z or a-z)read (CJK symbol)？

score 1 · Accepted Answer

1

awk '$0~/ read [a-zA-Z]+/' your_file

于 2012-09-13T13:15:57.593 回答

score 1 · Accepted Answer

我不完全确定如何匹配 CJK 字符，但如果您匹配非 ASCII 字符，您可能会获得您正在寻找的结果：

grep -vP "[^A-Za-z]read [\x80-\xFF]" file.txt

从理论上讲，您应该能够：

grep -vP "[^A-Za-z]read [\x{2E80}-\x{9FBB}]+" file.txt

但是在我的测试中，我得到了错误：

grep: character value in \x{...} sequence is too large

编辑：

LC_ALL="POSIX" sed -r '/[^A-Za-z]read [\o200-\o377]+/d' file.txt

结果：

Do you like to read books? (not deleted)
This is some thread 线. (not deleted)

另见：

2 回答 2