csv - 如何限制 sed 仅替换出现在第一个右方括号之后的数据？

Question

我有一个使用高度自定义格式的 CSV 文件。在这里，每个数字代表 4 列中的每一列中的数据：

1 2 [3] 4

我需要限制sed为仅搜索和修改出现在第四列中的数据。本质上，它必须忽略出现在第一次出现右方括号和空格之前的行上的所有数据，]并且只修改出现在之后的数据。例如，file1.txt可能包含以下内容：

penguin bird [lives in Antarctica] The penguin lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

替代品可能是sed 's/penguin/animal/g' file1.txt. 运行脚本后，输出将如下所示：

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animal.

在这种情况下，所有出现的penguin都在第一次之前被忽略，]并且只在出现在之后的行上发生更改。

额外的右括号可能会出现在该行的后面，但只有第一个应该被视为除法。

在查找和替换文本时，如何sed忽略此自定义 CSV 格式的前三列？

我有 GNU sed 4.2.1 版。

score 3 · Accepted Answer

您告诉 sed 搜索 '] ' 组合后跟.*(anything)，然后作为替换的一部分，您放回]字符。

唯一的问题是sed通常“认为” ]char 是字符类定义的一部分，因此您必须对其进行转义。尝试

echo "a b [c] d" | sed 's/\] .*$/\] XYZ/'
a b [c] XYZ

请注意，因为没有开始[字符来指示字符类定义，所以你可以侥幸逃脱

echo "a b [c] d" | sed 's/] .*$/] XYZ/'
a b [c] XYZ

编辑

只修复第 4 个单词，

echo "a b [c] d e" | sed 's/\] [^ ][^ ]*/\] XYZ/'
a b [c] XYZ e

上面的加法[^ ][^ ]/表示“any-char-that-is-not-a-space”后跟任意数量的“any-char-that-is-not-a-space”，所以当匹配器找到下一个空格时停止匹配。

最终编辑

echo "penguin bird [lives in Antarctica] The penguin lives in cold places.
wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \
| sed 's/\] The penguin \(.*$\)/] The animal \1/'

并且当您使用 gnu sed 时，您不需要转义(...) 捕获括号。

echo "penguin bird [lives in Antarctica] The penguin lives in cold places.
wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \
| sed 's/\] The penguin (*$)/] The animal \1/'

输出

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

根据您使用的 sed 版本。在 lunixsed中，AIX、 vs solaris、 VS GNU seds 之间存在很大差异。

如果您有其他关于使用 sed 的问题，包含sed --version或的输出通常会有所帮助sed -V。如果这些命令没有响应，请尝试what sed. 否则包括uname.

IHTH

score 2 · Accepted Answer

假设您只出现一次右括号，我awk会这样做：

awk 'BEGIN {FS=OFS="]"} { gsub(/penguin/, "animal", $2) }1' file.txt

结果：

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animals.

score 2 · Accepted Answer

通常我会按照庇护所描述的方式进行操作（如果我只是在快速sed命令行中输入），但它的缺点是，一旦你开始匹配部分输入以保留它（使用\1等），你必须匹配并替换所有内容并且不能再使用简单的替换，如s/penguin/animal/. 如果您愿意在替换周围添加一些样板，您可以将行的开头隐藏在保持缓冲区中，然后将其取回：

sed -e 'h' \
    -e 's/.*\] //' \
    -e 's/penguin/animal/' \
    -e 'x' \
    -e 's/\] .*/] /' \
    -e 'G' \
    -e 's/\n//'

将h原始行保存在保持空间中。然后我们删除前缀并在行尾进行任何替换（在此处选择您的示例）或一系列替换。然后x交换结尾和保存的副本。我们从保存的副本中删除原始端G并将它们重新组合在一起。G添加一个我们不想要的换行符，所以我们删除它。

score 1 · Accepted Answer

这可能对你有用（GNU sed）；

sed  -i 's/\]/&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file

解释：

s/\]/&\n/用\n标记分割线
h复制行
s/.*\n//删除您不想更改的行的部分
s/penguin/animal/g更改您要更改的部分
H;g将其添加回原始行
s/\n.*\n//删除要更改的原始行的部分

这适用于每一行，如果更改是有条件的，请使用：

sed  -i '/\]/!b;s//&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file

另一种选择（也许更简单的方法）：

sed ':a;s/\(\].*\)penguin/\1animal/;ta' file

csv - 如何限制 sed 仅替换出现在第一个右方括号之后的数据？

4 回答 4

Related

Reference