string - sed 搜索多个字符串并将每个字符串及其后面的字符串输出到单独的行

Question

例如; 我有一个长文件，其中包含：

Somestring anotherstring -xone xcont othertring -yone ycont againother \
-detail "detail Contents within quote" stuff morestuff .. 

Somestring anotherstring -xone xcont othertring -yone ycont againother \
morestrings -detail detailCont morestrings etc.. ..

想要的输出：

-xone xcont
-ycont ycont
-detail "detail Contents withing quote"

最好有一个 csv 文件：

xone yone detail
xcont ycont "detail Contents within quote"

获得所需输出的最佳方法是什么？我一直在尝试使用 sed 命令，但成功率非常有限。我是 perl 的新手，所以也没有走得太远。请解释建议的解决方案。提前致谢！

score 1 · Accepted Answer

这个问题由两部分组成：

如何匹配标签
如何有序地输出它们。

匹配部分非常简单，使用正则表达式。每个标签都是一个连字符减号，后跟一些单词字符。作为正则表达式模式：-\w+.

该值似乎是一个单词（我们可以匹配类似\w+）或带引号的字符串。假设这个字符串不能包含它的分隔符，我们可以使用"[^"]+"where[^"]是一个否定字符类，它匹配除双引号字符之外的任何字符。

我们如何结合这个？通过交替和命名捕获：

# I'll answer with Perl
my $regex = qr/-(?<key>\w+) \s+ (?: (?<val>\w+) | "(?<val>[^"]+)" )/x;

之后，$+{key}包含该标签的键和$+{val}值。我们现在可以提取一行中的所有标签。给定输入

Somestring anotherstring -xone xcont othertring -yone ycont againother \-detail "detail Contents within quote" stuff morestuff .. 
Somestring anotherstring -xone xcont othertring -yone ycont againother \morestrings -detail detailCont morestrings etc.. ..

和代码

use strict; use warnings; use feature 'say';
my $regex = ...;
while (<>) {
  while (/$regex/g) {
    say qq($+{key}: "$+{val}");
  }
}

我们得到输出

xone: "xcont"
yone: "ycont"
detail: "detail Contents within quote"
xone: "xcont"
yone: "ycont"
detail: "detailCont"

要以表格格式打印出来，我们必须以某种结构收集数据。我将假设每个标签可以在每一行出现一次。然后我们可以使用哈希来定义从标签到它们的值的映射。我们将这些哈希值收集在一个数组中，每行一个。我们还必须收集所有标题的名称，以防一行不包含所有标题。现在我们的代码更改为：

use strict; use warnings; use feature 'say';
my $regex = ...;
my %headers;
my @rows;
while (<>) {
  my %tags;
  while (/$regex/g) {
    $tags{$+{key}} = $+{val};
  }
  push @rows, \%tags;
  @headers{keys %tags} = ();  # define the headers
}

现在我们如何打印数据呢？我们可以将它们转储为制表符分隔值：

my @headers = keys %headers;
say join "\t", map qq("$_"), @headers;
say join "\t", map qq("$_"), @$_{@headers} for @rows;

输出：

"yone"  "detail"        "xone"
"ycont" "detail Contents within quote"  "xcont"
"ycont" "detailCont"    "xcont"

哦，列的顺序是随机的。Text::CSV如果我们使用该模块，我们可以做得更好。然后：

use Text::CSV;

my @headers = keys %headers;
my $csv = Text::CSV->new({ eol => "\n" });
$csv->print(\*STDOUT, \@headers);
$csv->print(\*STDOUT, [@$_{@headers}]) for @rows;

我们得到输出：

yone,xone,detail
ycont,xcont,"detail Contents within quote"
ycont,xcont,detailCont

列的顺序仍然是随机的，但这可以通过排序来解决。

您可以通读Text::CSV文档以发现许多如何调整输出的选项。

score 0 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r '/-(xone|yone|detail)/!d;s//\n\1/;s/[^\n]*\n//;s/\S+\s+("[^"]*"|\S+)/&\n/;P;D' file

这将查找包含字符串的行-xone，-yone或-detail仅打印它们以及由"'s 或其他单词括起来的以下单词。

string - sed 搜索多个字符串并将每个字符串及其后面的字符串输出到单独的行

2 回答 2

Related

Reference