0

寻找使用 awk 记录解析的解决方案,其中,其中也可以是/n字符。记录用 分隔|。问题是当达到一定数量的字段时可以确定新行。如何在 awk 中做到这一点?

例子:

2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|

是一个记录,它有很多\n字符。我需要用 awk 解析并从中获取例如第 5 个字段。

4

3 回答 3

3

从上面 sudo_O 的答案中汲取灵感...将变量 FIELD_TO_PRINT 设置为感兴趣的字段位置,将另一个变量 FIELDS_PER_RECORD 设置为表示记录的字段数。GNU awk在 Ubuntu 上测试

awk   -v FIELDS_PER_RECORD=10 -v FIELD_TO_PRINT=5 'BEGIN{FS="|"; RS="\0"}\
{for (i=1; i<=NF; ++i) {if (i % FIELDS_PER_RECORD == FIELD_TO_PRINT) {print $i} }}' file_name.txt
th2056569632
x10354453
SET DATESTYLE = "ISO"; Select * from bb where cc='1'
于 2013-04-12T12:32:44.760 回答
1

对于文件中的一条记录,您不能将记录分隔符设置为空字符RS='\0',以便将输入文件作为一条完整记录读取:

$ awk '{print $5}' FS='|' RS='\0' file
th2056569632

对于多条记录,您可以将其date用作记录分隔符(除非它们已经用空行分隔,这会使事情变得更简单,或者除非您在输出中需要此字段)

$ awk 'NR>1{print $5}' FS='|' RS='(^|[^|])[0-9]{4}-[0-9]{2}-[0-9]{2} ' file
th2056569632
th1093698336

更简单的grep -o 'th[0-9]*' file 是否适合这里?

于 2013-04-12T12:12:27.780 回答
1

显然,这不是您所要求的:为了比较,这就是我在 python 中的做法:

from cStringIO import StringIO

def records_from_file(f,separator='|',field_count=30):
  record = []
  for line in f:
    fields = line.split(separator)
    if len(record) > 0:
      # Merge last of existing with first of new
      record[-1] += fields[0]
      # Extend rest of fields
      record.extend(fields[1:])
    else:
      record.extend(fields)
    if len(record) > field_count:
      raise Exception("Concatenating records overflowed number of fields",record)
    elif len(record) == field_count:
      yield record
      record = []

sample = """2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|"""

for record in records_from_file(StringIO(sample)):
  print record[4]

产量:

th2056569632
th1093698336
于 2013-04-12T12:42:27.363 回答