0

我有一个非常具体的问题,我无法解决,它与解析和合并来自不同行的相关数据有关

我有一个文件,其中包含格式如下所示的文本:

======================================================
8:27:24 PM  http://10.11.12.13:80
======================================================
GET /dog-pictures HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed

======================================================



======================================================
8:28:56 PM  http://192.114.126.245:80
======================================================
GET /flowers HTTP/1.1
Host: 10.11.12.13
Language: english

======================================================



======================================================
8:29:07 PM  http://10.11.12.13:80
======================================================
GET /africas-animals HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: open

======================================================

正如您在上面看到的,文本文件中的每组数据由三行等号 (=======) 组成,但其中可以包含不同数量的数据行。

我需要输出的格式如下:

    http://10.11.12.13/dog-pictures
    http://192.114.126.245/flowers
    http://10.11.12.13/africas-animals

我需要合并的位的解释:

======================================================
8:27:24 PM  http://10.11.12.13:80                     <--- Gets the first part from here**
======================================================
GET /dog-pictures HTTP/1.1                            <--- Gets the seconds part from here**
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed

======================================================

非常感谢您对这个问题的帮助,在此先感谢您

4

3 回答 3

1

尝试Perl在 a中执行此操作shell

perl -lane '
    if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
        $scheme = $1;
    }
    if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
        $path = $F[1];
    }
    if (/^Host/) {
        print "$scheme://$F[1]$path";
    }
' file.txt

脚本版本

通过perl -MO=Deparse一些调整生成...

#!/usr/bin/env perl
# mimic `-l` switch to print like "say"
BEGIN { $/ = "\n"; $\ = "\n"; }

use strict; use warnings;

my ($scheme, $path);

# magic diamond operator
while (<ARGV>) {
    chomp $_;
    # splitting current line in @F array
    my (@F) = split(' ', $_, 0);

    # regex to catch the scheme (http)
    if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
        $scheme = $1;
    }
    # if the current line match an HTTP verb, we feed $path variable
    # with second column
    if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
        $path = $F[1];
    }
    # if the current line match HOST, we print the needed line
    if (/^Host/) {
        print "${scheme}://$F[1]$path";
    }
}

用法

chmod +x script.pl
./script.pl file.txt

输出

http://10.11.12.13/dog-pictures
http://10.11.12.13/flowers
http://10.11.12.13/africas-animals
于 2013-01-22T23:16:16.560 回答
1

也许以下内容会对您有所帮助:

use strict;
use warnings;

open my $fh, '<', 'data.txt' or die $!;

# Read a file line
while (<$fh>) {

    # If url captured on line beginning with time and read (separator) line
    if ( my ($url) = /^\d+:\d+:\d+.+?(\S+):\d+$/ and <$fh> ) {

        # Capture path
        my ($path) = <$fh> =~ /\s+(\/\S+)\s+/;

        print "$url$path\n" if $url and $path;
    }
}

输出:

http://10.11.12.13/dog-pictures
http://192.114.126.245/flowers
http://10.11.12.13/africas-animals

只有两行包含您想要的信息,它们之间用等号分隔。第一个正则表达式尝试匹配时间字符串并捕获该行上的 url。and <$fh>用于通过分隔符。第二个正则表达式捕获下一行的路径。最后打印出url和路径。

于 2013-01-23T03:05:16.667 回答
0

珀尔:

perl -F -lane 'if(/http/){$x=$F[2]}if(/GET/){print $x.$F[1]}' your_file

如果你想去 awk:

awk '/http/{x=$3}/GET/{print x""substr($2,1)}' your_file
于 2013-01-23T06:17:13.397 回答