1

你好再次stackoverflow!

我有一个非常大的平面文件,我希望导出匹配 2 种不同模式的所有记录。问题是每条记录中的行数不同,并且记录相互渗透。记录的最后一行是 Door ID,第一行是 User: 。

我正在针对电子邮件地址中的@ 进行测试,并且上次登录包含“登录时间:2013-08”。我需要导出所有行,包括电子邮件地址行和最后登录行。以下是 2 个样品。我试过像这样使用awk:

awk '/login time: 2013-08/{e=0}/@ /{gsub("^.*@ ","",$0);e=1}{if(e==1){print}}'  filename

当然失败了....

所以这是示例数据

User: afshin@runners.org
First Name: Afshi
Last Name: Noghami
Is Delegated Admin: False
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 102209840259208897543
ID TPYE: Cx4
Creation Time: 2013-06-07T04:14:42.000Z
Last login time: Never
Path: /Members/Inactive

IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:
User: jjnalli@runners.org
First Name: JISS
Last Name: NALLIKUZHY
Is a Super Admin: False
Is Delegated Admin: False
Has Agreed to Terms: True
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 109765147242431344122
ID TYPE: Cx4
Mailbox setup: True
Included: False
Creation Time: 2013-06-07T03:32:52.000Z
Last login time: 2013-08-02T07:13:02.000Z
Path: /Members/Inactive

IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:

对于具有最后登录日期的每条记录,所需的输出如下所示:

User: jjnalli@runners.org  
First Name: JISS  
Last Name: NALLIKUZHY  
Is a Super Admin: False  
Is Delegated Admin: False  
Has Agreed to Terms: True  
IP Whitelisted: False  
Account Suspended: False  
Must Change Password: False  
Unique ID: 109765147242431344122  
ID TYPE: Cx4  
Mailbox setup: True  
Included: False  
Creation Time: 2013-06-07T03:32:52.000Z  
Last login time: 2013-08-02T07:13:02.000Z 
4

4 回答 4

1
awk '/User:/{if(NR!=1){for(i=0;i<j;i++)print a[i]>"file"k;j=0;k++;}a[j++]=$0;next}{a[j++]=$0;}END{for(i=0;i<j;i++)print a[i]>"file"k}' i=0 k=1  grepper.txt

其中 grepper.txt 包含输入数据

这会将文件拆分为多个文件,每个文件有一条记录(当然有多行)。

然后 grep 并丢弃不需要的文件。

循环内

grep "login time: 2013-08" fileN && grep "User:" fileN | grep "@" || rm -f fileN
于 2013-09-18T14:15:47.830 回答
1

首先,将每条记录读入一个字段数组:

BEGIN { FS = ": " }   # each line has fieldname and value

/^$/ { next }         # skip blank records

$1 == "User" {        # first field of new record
    delete fields     # delete current array
    fields[$1] = $2 } # store field value in array

$1 == "Door IDs" {    # last field of current record
    fields[$1] = $2   # store field value in array
    do_process() }    # process current record

$1 != "User" &&       # fields between first ...
$2 != "Door IDs" {    #             ... and last
    fields[$1] = $2 } # store field value in array

然后,对记录做任何你需要做的事情。这里我打印用户和上次登录时间字段,但您可以进行任何您需要的处理:

function do_process() {
    print fields["User"], fields["Last login time"] }

请注意,我尚未测试此代码...

编辑:根据下面的评论修改。我假设 User 字段总是标记新记录的开始。这是读取和存储记录的代码的修订版本:

BEGIN { FS = ": "       # each line has fieldname and value
        first = 1 }     # flag for first record

/^$/ { next }           # skip blank records

$1 == "User" {          # first field of new record
    if (first > 1)      # no data the first time; skip
        do_process()    # process current record
    delete fields       # reset fields for new record
    fields[$1] = $2 }   # store field value in array

$1 == "Door IDs" {      # last field of current record
    fields[$1] = $2     # store field value in array
    do_process() }      # process current record

/./ { fields[$1] = $2 } # store field value in array

END { if (first > 1)    # last record not processed
        do_process() }  # process last record

然后,您可以随心所欲地处理数据。

于 2013-09-18T15:47:33.000 回答
1

^User将行从to分组Door ID,而不是仅在匹配时打印@.*login time: 20[0-9]...

我想我终于明白了你的需要:

尝试这个:

sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/p}' file

这将符合您的要求。

合并每个数据包后,您甚至可以删除所有匹配2013-08的条目:

sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/{/login time: 2013-08/!p}}' file
于 2013-09-18T14:35:32.690 回答
1

也许这样的事情可以为你工作:

awk '$1=="User:",/login time: 2013-08/' file
于 2013-09-18T13:57:45.493 回答