python - Python - 在括号之间读取为单个“块”

Question

我正在尝试用 Python 编写一个程序，该程序对一段代码执行各种任务。我已经完成了其中的大部分，但有一个让我感到困惑。我对行话的了解不够，无法有效地搜索此问题的帮助，所以我求助于这里。

我需要创建一个进程，它将括号之间的任何内容作为单个“块”读取。然后，如果“块”包含特定的单词或短语，python 代码将删除它。

示例（简化）文本文件内容：

...
entity
{
    "id" "38794"
    "classname" "info_player_teamspawn"
}
entity
{
    "id" "38795"
    "classname" "func_detail"
    solid
}
entity
{
    "id" "38796"
    "classname" "path_track"
}
...

在此示例中，将列出数千个此类实体。我希望python代码删除包含单词“solid”的任何括号的括号内的任何内容（包括“实体”序言），即：这将是结果片段：

...
entity
{
    "id" "38794"
    "classname" "info_player_teamspawn"
}
entity
{
    "id" "38796"
    "classname" "path_track"
}
...

id 不需要更正。我们不需要担心这一点。

我希望我能很好地解释我的问题，我希望有一个可能的解决方案。如果有人想要一个行话库，我可以用它来帮助解释或研究我可能遇到的任何进一步的问题，那也将不胜感激！

提前谢谢了！

score 1 · Accepted Answer

可以使用单个正则表达式完成所有操作。但是，这很快就会变得不可读，尤其是当您跨越多行时（我猜您可能还有其他可能想要删除的模式）。

我会将问题一分为二：

首先，使用此正则表达式查找所有实体块：

p = re.compile(r'entity\s*{(.*?)}')

然后定义一个替代函数来进行替换。

def remove_solid(match):
    text = match.groups(0)
    if text.find('solid') != -1:
        return ''
    else
        return text

像这样把这两个钩在一起

output = p.sub(remove_solid, input)

score 1 · Accepted Answer

First, let's write a generator that yields titles ("entity") and their respective blocks:

def blocks(filename):
    title, block = '', None
    with open(filename) as fp:
        for line in fp:
            if '{' in line:
                block = line
            elif block is not None:
                block += line
            else:
                title = line
            if '}' in line:
                yield title, block
                title, block = '', None

Then read the blocks and output those passing the test:

for title, block in blocks('input.txt'):
    if 'solid' not in block:
        print title, block

score 0 · Accepted Answer

您可以使用正则表达式 (regex) 搜索以下模式并将匹配的文本替换为换行符或空格。

import re

[...]
output = re.sub(r'entity\n{[\w\s\n"]*solid[\w\s\n"]*\n}\n', '', input)
[...]

score 0 · Accepted Answer

0

怎么样：

 re.sub("entity\s*{[^}]*solid\s*}",'',yourString)

于 2012-10-17T09:12:44.233 回答

score 0 · Accepted Answer

这是一个非正则表达式解决方案。它可能更冗长，但也更直观。

input = open("a.txt", "rb")
output = open("b.txt", "wb") # an empty file for output

def filter_block(instream, outstream, keyword):
    block_buffer = []
    in_block = False
    dump_block = False
    for line in instream:                 # <- Iterate through the lines of the input
        line = line.rstrip()
        block_buffer.append(line)         # <- Keep the block of text in memory

        line_text = line.strip()
        if line_text == "{":
            in_block = True
        elif line_text == keyword and in_block:            # <- Check if this block
            dump_block = True                              #    needs to be dumped
        elif line_text == "}":
            if not dump_block:                             # <- If not, 
                outstream.write("\n".join(block_buffer))   # <- keep it.
                #print "\n".join(block_buffer)

            block_buffer = []                              # <- Flush buffer, continue
            in_block = dump_block = False



filter_block(input, output, "solid")

python - Python - 在括号之间读取为单个“块”

5 回答 5

Related

Reference