bash - search (e.g. awk, grep, sed) for string, then look for X lines above and another string below

Question

I need to be able to search for a string (lets use 4320101), print 20 lines above the string and print after this until it finds the string

For example:

Random text I do not want or blank line
16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>
Random text I do not want or blank line

I just want the following result outputted to a file:

16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>

There are multiple examples of these groups of text in a file that I want.

I tried using this below:

cat filename | grep "</eventUpdate>" -A 20 4320101 -B 100 > greptest.txt

But it only ever shows for 20 lines either side of the string.

Notes:
- the line number the text is on is inconsistent so I cannot go off these, hence why I am using -A 20.
- ideally I'd rather have it so when it searches after the string, it stops when it finds and then carries on searching.

Summary: find 4320101, output 20 lines above 4320101 (or one line of white space), and then output all lines below 4320101 up to

</eventUpdate>

Doing research I am unsure of how to get awk, nawk or sed to work in my favour to do this.

score 1 · Accepted Answer

这可能对您有用（GNU sed）：

sed ':a;s/\n/&/20;tb;$!{N;ba};:b;/4320102/!D;:c;n;/<\/eventUpdate>/!bc' file

编辑：

:a;s/\n/&/20;tb;$!{N;ba};这会在模式空间 (PS) 中保留 20 行的窗口
:b;/4320102!D;这将在文件中移动上述窗口，直到4320102找到模式。
:c;n;/<\/eventUpdate>/!bc打印 20 行窗口和任何后续行，直到 <\/eventUpdate>找到该模式。

score 1 · Accepted Answer

在 sed/awk 中进行回顾总是很棘手。这个自包含awk的脚本基本上保存了最后 20 行，当 4320101它打印这些存储的行时，直到找到空白行或不需要的行，然后它停止. 此时它切换到printall模式并打印所有行，直到eventUpdate遇到，然后打印并退出。

awk '
function store( line ) {
    for( i=0; i <= 20; i++ ) {
        last[i-1] = last[i]; i++;
    };
    last[20]=line;
};
function purge() {
    for( i=20; i >= 0; i-- ) {
        if( length(last[i])==0 || last[i] ~ "Random" ) {
            stop=i;
            break
        };
    };
    for( i=(stop+1); i <= 20; i++ ) {
        print last[i];
    };

};
{
store($0);
if( /4320101/ ) {
    purge();
    printall=1;
    next;
};
if( printall == 1) {
    print;
    if( /eventUpdate/ ) {
        exit 0;
    };
};
}' test

score 1 · Accepted Answer

你可以试试这样的 -

awk '{ 
    a[NR] = $0
}

/<\/eventUpdate>/ { 
    x = NR
}

END {
    for (i in a) {
        if (a[i]~/4320101/) {
            for (j=i-20;j<=x;j++) {
            print a[j]
            }
        }
    }
}' file

score 1 · Accepted Answer

这是一个丑陋的awk解决方案:)

awk 'BEGIN{last=1}
{if((length($0)==0) || (Random ~ $0))last=NR} 
/4320101/{flag=1;
if((NR-last)>20) last=NR-20;
cmd="sed -n \""last+1","NR-1"p \" input.txt";
system(cmd);
}
flag==1{print}
/eventUpdate/{flag=0}' <filename>

所以基本上它所做的是跟踪变量中最后一个空白行或包含Random模式的行。last现在，如果找到了，它会通过命令从更近的4320101地方打印。并设置. 原因是要打印下一行，直到找到为止。虽然没有测试，但应该可以工作that line -20 or lastsystem sedflagflageventUpdate

score 1 · Accepted Answer

让我们看看我是否理解您的要求：

您有两个字符串，我将其称为KEYand LIMIT。你想打印：

在包含的行之前最多 20 行KEY，但如果有空行则停止。
KEY包含的行和包含的下一行之间的所有行LIMIT。（这忽略了您的要求，即不超过 100 行这样的行；如果这很重要，添加起来相对简单。）

最简单的方法(1)是保留一个 20 行的循环缓冲区，并在点击key. (2)在 sed 或 awk 中都是微不足道的，因为您可以使用双地址形式来打印范围。

所以让我们在 awk 中进行：

#file: extract.awk

# Initialize the circular buffer
BEGIN          { count = 0; }
# When we hit an empty line, clear the circular buffer
length() == 0  { count = 0; next; }
# When we hit `key`, print and clear the circular buffer
index($0, KEY) { for (i = count < 20 ? 0 : count - 20; i < count; ++i)
                   print buf[i % 20];
                 hi = 0;
               }
# While we're between key and limit, print the line
index($0, KEY),index($0, LIMIT)
               { print; next; }
# Otherwise, save the line
               { buf[count++ % 20] = $0; }

为了让它工作，我们需要设置和的KEY值LIMIT。我们可以在命令行上做到这一点：

awk -v "KEY=4320101" -v "LIMIT=</eventUpdate>" -f extract.awk $FILENAME

笔记：

我使用index($0, foo)而不是更常用的/foo/，因为它避免了必须转义正则表达式特殊字符，并且在要求中没有任何地方甚至需要正则表达式。index(haystack, needle)返回needlein的索引haystack，索引从开始1，或者0如果needle未找到。用作真/假值，needle找到为真。
next导致当前行的处理结束。正如这个小程序所示，它可能非常方便。

score 0 · Accepted Answer

最简单的方法是使用文件的 2 次传递 - 第一次识别目标正则表达式所在范围内的行号，第二次打印所选范围内的行，例如：

awk '
NR==FNR {
    if ($0 ~ /\<4320101\>/ {
        for (i=NR-20;i<NR;i++)
            range[i]
        inRange = 1
    }
    if (inRange) {
        range[NR]
    }
    if ($0 ~ /<\/eventUpdate>/) {
        inRange = 0
    }
    next
}
FNR in range
' file file

bash - search (e.g. awk, grep, sed) for string, then look for X lines above and another string below

6 回答 6

Related

Reference