4

我想将项目列表(键/值对)转换为表格格式。解决方案可以是 bash 脚本、awk、sed 或其他方法。

假设我有一个很长的列表,例如:

date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value
key2: value

date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value

...

我想用制表符或其他分隔符转换成表格格式,如下所示:

date and time   file size   key1    key2
2013-02-21 18:18 PM 1283483 bytes   value   value
2013-02-21 18:19 PM 1283493 bytes       value
...

或像这样:

date and time|file size|key1|key2
2013-02-21 18:18 PM|1283483 bytes|value|value
2013-02-21 18:19 PM|1283493 bytes||value
...

我已经查看了诸如这种在 Bash 中转置文件的有效方法之类的解决方案,但似乎我在这里有不同的情况。awk 解决方案部分适用于我,它不断将所有行输出到一长列列中,但我需要将列限制为唯一列表。

awk -F': ' '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' filename

更新

感谢所有提供解决方案的人。其中一些看起来很有希望,但我认为我的工具版本可能已经过时并且我遇到了一些语法错误。我现在看到的是,我没有从非常明确的要求开始。感谢 sputnick 在我阐明全部要求之前第一个提供解决方案。当我写这个问题时,我度过了漫长的一天,因此不是很清楚。

我的目标是提出一个非常通用的解决方案,将多个项目列表解析为列格式。我认为该解决方案不需要支持超过 255 列。列名不会提前知道,这样解决方案将适用于任何人,而不仅仅是我。两个已知的东西是 kev/值对之间的分隔符(“:”)和列表之间的分隔符(空行)。为这些设置一个变量会很好,以便其他人可以对其进行配置以重用它。

通过查看建议的解决方案,我意识到一个好的方法是对输入文件进行两次传递。第一遍是收集所有列名,可选择对它们进行排序,然后打印标题。其次是获取列的值并打印它们。

4

5 回答 5

2

这是使用GNU awk. 像这样运行:

awk -f script.awk file

内容script.awk

BEGIN {
    # change this to OFS="\t" for tab delimited ouput
    OFS="|"

    # treat each record as a set of lines
    RS=""
    FS="\n"
}

{
    # keep a count of the records
    ++i

    # loop through each line in the record
    for (j=1;j<=NF;j++) {

        # split each line in two
        split($j,a,": ")

        # just holders for the first two lines in the record
        if (j==1) { date = a[1] }
        if (j==2) { size = a[1] }

        # keep a tally of the unique key names
        if (j>=3) { !x[a[1]] }

        # the data in a multidimensional array:
        # record number . key = value
        b[i][a[1]]=a[2]
    }
}

END {

    # sort the unique keys
    m = asorti(x,y)

    # add the two strings to a numerically indexed array
    c[1] = date
    c[2] = size

    # set a variable to continue from
    f=2

    # loop through the sorted array of unique keys
    for (j=1;j<=m;j++) {

        # build the header line from the file by adding the sorted keys
        r = (r ? r : date OFS size) OFS y[j]

        # continue to add the sorted keys to the numerically indexed array
        c[++f] = y[j]
    }

    # print the header and empty
    print r
    r = ""

    # loop through the records ('i' is the number of records)
    for (j=1;j<=i;j++) {

        # loop through the subrecords ('f' is the number of unique keys)
        for (k=1;k<=f;k++) {

            # build the output line
            r = (r ? r OFS : "") b[j][c[k]]
        }

        # and print and empty it ready for the next record
        print r
        r = ""
    }
}

这是一个名为的测试文件的内容file

date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value1
key2: value2

date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value2
key1: value1
key3: value3

date and time: 2013-02-21 18:20 PM
file size: 1283494 bytes
key3: value3
key4: value4

date and time: 2013-02-21 18:21 PM
file size: 1283495 bytes
key5: value5
key6: value6

结果:

2013-02-21 18:18 PM|1283483 bytes|value1|value2||||
2013-02-21 18:19 PM|1283493 bytes|value1|value2|value3|||
2013-02-21 18:20 PM|1283494 bytes|||value3|value4||
2013-02-21 18:21 PM|1283495 bytes|||||value5|value6
于 2013-02-22T04:44:31.990 回答
1

这不会对列结构进行任何假设,因此它不会尝试对它们进行排序,但是,所有记录的所有字段都以相同的顺序打印:

use strict;
use warnings;

my (@db, %f, %fields);
my $counter = 1;
while (<>) {
  my ($field, $value) = (/([^:]*):\s*(.*)\s*$/);
  if (not defined $field) {
    push @db, { %f };
    %f = (); 
  } else {
    $f{$field} = $value;
    $fields{$field} = $counter++ if not defined $fields{$field};
  }
}
push @db, \%f;

#my @fields = sort keys %fields; # alphabetical order
my @fields = sort {$fields{$a} cmp $fields{$b} } keys %fields; #first seen order

# print header
print join("|", @fields), "\n";

# print rows
for my $row (@db) {
  print join("|", map { $row->{$_} ? $row->{$_} : "" } @fields), "\n";
}
于 2013-02-22T04:53:05.867 回答
1

这是一个纯粹的 awk 解决方案:

# split lines on ": " and use "|" for output field separator
BEGIN { FS = ": "; i = 0; h = 0; ofs = "|" }

# empty line - increment item count and skip it
/^\s*$/ { i++ ; next } 

# normal line - add the item to the object and the header to the header list
# and keep track of first seen order of headers
{
   current[i, $1] = $2
   if (!($1 in headers)) {headers_ordered[h++] = $1}
   headers[$1]
}

END {
   h--

   # print headers
   for (k = 0; k <= h; k++)
   {
      printf "%s", headers_ordered[k]
      if (k != h) {printf "%s", ofs}
   } 
   print "" 

   # print the items for each object
   for (j = 0; j <= i; j++)
   {
      for (k = 0; k <= h; k++)
      {
         printf "%s", current[j, headers_ordered[k]]
         if (k != h) {printf "%s", ofs}
      }
      print ""
   }
}

示例输入(注意最后一项之后应该有一个换行符):

foo: bar
foo2: bar2
foo1: bar

foo: bar3
foo3: bar3
foo2: bar3

示例输出:

foo|foo2|foo1|foo3
bar|bar2|bar|
bar3|bar3||bar3

注意:如果您的数据中嵌入了“:”,您可能需要更改此设置。

于 2013-02-22T04:43:33.587 回答
0

使用

use strict; use warnings;

# read the file paragraph by paragraph
$/ = "\n\n";

print "date and time|file size|key1|key2\n";

# parsing the whole file with the magic diamond operator
while (<>) {
    if (/^date and time:\s+(.*)/m) {
        print "$1|";
    }

    if (/^file size:(.*)/m) {
        print "$1|";
    }

    if (/^key1:(.*)/m) {
        print "$1|";
    }
    else {
        print "|";
    }

    if (/^key2:(.*)/m) {
        print "$1\n";
    }
    else {
        print "\n";
    }
}

用法

perl script.pl file

输出

date and time|file size|key1|key2
2013-02-21 18:18 PM| 1283483 bytes| value| value
2013-02-21 18:19 PM| 1283493 bytes|| value
于 2013-02-22T03:05:02.467 回答
0

例子:

> ls -aFd * | xargs -L 5 echo | column -t
bras.tcl@      Bras.tpkg/           CctCc.tcl@       Cct.cfg      consider.tcl@
cvsknown.tcl@  docs/                evalCmds.tcl@    export/      exported.tcl@
IBras.tcl@     lastMinuteRule.tcl@  main.tcl@        Makefile     Makefile.am
Makefile.in    makeRule.tcl@        predicates.tcl@  project.cct  sourceDeps.tcl@
tclIndex   

                                                
于 2021-04-20T19:23:53.157 回答