python - 操纵 txt 搜索三种模式（sed、awk、pcregrep）

Question

我有这个文本文件

AAAA
1234
title example
Lorem Ipsum
FF
AAAA
1234
title example
€330 - Roma
FF

我想从这个文件中只提取 txt ：

START WITH AAAA
HAS Euro SYmbol
END WITH FF

在这种情况下，我只想匹配那个

AAAA
1234
title example
€330 - Roma
FF

我尝试使用不同的解决方案

sed -e '/AAAAs/,/europ/,/FF/!d' testfile.txt

但它会提取 AAAA 和 FF 之间的所有 txet

我该如何解决？

感谢帮助

编辑：

在欧元线和FF之间可能有一些文字。不知道多少行。。

AAAA
1234
title example
€330 - Roma
Some text with \n, comma symbol etc etc
FF

我想提取 AAAA 和 FF 之间的 txt

score 3 · Accepted Answer

3

于 2017-03-03T09:01:37.547 回答

score 1 · Accepted Answer

一个不错的快速方法是将 grep 与多个搜索模式一起使用。因此，对于您的需求：

grep -B3 -A1 -e '€' test.txt

这将找到欧元符号，并打印之前的 3 行和之后的 2 行，但是这仅在您希望文件保持相同模式时才有效，即 AAAA 和 FF 在上方和下方出现相同数量的行。

score 1 · Accepted Answer

Python 是一种过程语言，因此它可能需要更多的文本，但对于复杂的事情来说更简单。在这里你应该：

当您看到 AAAA 线时开始存储
当你看到一条 FF 线时结束存储并且
- 仅在包含 $ 时才保留存储的文本

这可以用 Python 翻译为：

with open(infile) as fd:
    processing = False
    txt = None
    euro = None
    for line in fd:
        if line.strip() == 'AAAA':     # start processing
            processing = True
            txt = ""
            euro = False
        if processing:
            txt += line                # store all lines between AAAA and FF
            if '€' in line: euro = True    # is an € present ?
            if line.strip() == 'FF':   # stop processing
                processing = False
                if euro:               # only print if a € was found
                    print(txt)

不像 awk、grep 或 sed 脚本那样紧凑，但易于编写、阅读和维护

score 0 · Accepted Answer

0

awk '/\xe2\x82\xac/{printf RS $0}' RS=AAAA file

于 2017-03-11T15:23:22.337 回答

score 0 · Accepted Answer

0

awk 'NR>5' file

AAAA
1234
title example
€330 - Roma
FF

于 2017-03-04T20:11:51.473 回答

python - 操纵 txt 搜索三种模式（sed、awk、pcregrep）

5 回答 5

Related

Reference