bash - 具有固定数量字段的 awk 解析

Question

寻找使用 awk 记录解析的解决方案，其中，其中也可以是/n字符。记录用分隔|。问题是当达到一定数量的字段时可以确定新行。如何在 awk 中做到这一点？

例子：

2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|

是一个记录，它有很多\n字符。我需要用 awk 解析并从中获取例如第 5 个字段。

score 3 · Accepted Answer

从上面 sudo_O 的答案中汲取灵感...将变量 FIELD_TO_PRINT 设置为感兴趣的字段位置，将另一个变量 FIELDS_PER_RECORD 设置为表示记录的字段数。GNU awk在 Ubuntu 上测试

awk   -v FIELDS_PER_RECORD=10 -v FIELD_TO_PRINT=5 'BEGIN{FS="|"; RS="\0"}\
{for (i=1; i<=NF; ++i) {if (i % FIELDS_PER_RECORD == FIELD_TO_PRINT) {print $i} }}' file_name.txt
th2056569632
x10354453
SET DATESTYLE = "ISO"; Select * from bb where cc='1'

score 1 · Accepted Answer

对于文件中的一条记录，您不能将记录分隔符设置为空字符RS='\0'，以便将输入文件作为一条完整记录读取：

$ awk '{print $5}' FS='|' RS='\0' file
th2056569632

对于多条记录，您可以将其date用作记录分隔符（除非它们已经用空行分隔，这会使事情变得更简单，或者除非您在输出中需要此字段）：

$ awk 'NR>1{print $5}' FS='|' RS='(^|[^|])[0-9]{4}-[0-9]{2}-[0-9]{2} ' file
th2056569632
th1093698336

更简单的grep -o 'th[0-9]*' file 是否适合这里？

score 1 · Accepted Answer

显然，这不是您所要求的：为了比较，这就是我在 python 中的做法：

from cStringIO import StringIO

def records_from_file(f,separator='|',field_count=30):
  record = []
  for line in f:
    fields = line.split(separator)
    if len(record) > 0:
      # Merge last of existing with first of new
      record[-1] += fields[0]
      # Extend rest of fields
      record.extend(fields[1:])
    else:
      record.extend(fields)
    if len(record) > field_count:
      raise Exception("Concatenating records overflowed number of fields",record)
    elif len(record) == field_count:
      yield record
      record = []

sample = """2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|"""

for record in records_from_file(StringIO(sample)):
  print record[4]

产量：

th2056569632
th1093698336

bash - 具有固定数量字段的 awk 解析

3 回答 3

Related

Reference