parsing - 将嵌套的缩进文本解析为列表

Question

将嵌套的缩进文本解析为列表

你好，

也许有人可以给我一个开始帮助。

我已经嵌套了与此类似的缩进 txt。我应该将其解析为嵌套列表结构，例如

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

Nested_Lists = ['Test1', 
    ['NeedHelp', 
        ['GotStuck', 
            ['Sometime', 
            'NoLuck']]], 
    ['NeedHelp2', 
        ['StillStuck', 
        'GoodLuck']]
]

Nested_Lists = ['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']]], ['NeedHelp2', ['StillStuck', 'GoodLuck']]]

任何对 python3 的帮助都会被应用

score 7 · Accepted Answer

你可以利用 Python 分词器来解析缩进的文本：

from tokenize import NAME, INDENT, DEDENT, tokenize

def parse(file):
    stack = [[]]
    lastindent = len(stack)

    def push_new_list():
        stack[-1].append([])
        stack.append(stack[-1][-1])
        return len(stack)

    for t in tokenize(file.readline):
        if t.type == NAME:
            if lastindent != len(stack):
                stack.pop()
                lastindent = push_new_list()
            stack[-1].append(t.string) # add to current list
        elif t.type == INDENT:
            lastindent = push_new_list()
        elif t.type == DEDENT:
            stack.pop()
    return stack[-1]

例子：

from io import BytesIO
from pprint import pprint
pprint(parse(BytesIO(TXT.encode('utf-8'))), width=20)

输出

['Test1',
 ['NeedHelp',
  ['GotStuck',
   ['Sometime',
    'NoLuck']]],
 ['NeedHelp2',
  ['StillStuck',
   'GoodLuck']]]

score 4 · Accepted Answer

我希望你能理解我的解决方案。如果没有，请问。

def nestedbyindent(string, indent_char=' '):
    splitted, i = string.splitlines(), 0
    def first_non_indent_char(string):
        for i, c in enumerate(string):
            if c != indent_char:
                return i
        return -1
    def subgenerator(indent):
        nonlocal i
        while i < len(splitted):
            s = splitted[i]
            title = s.lstrip()
            if not title:
                i += 1
                continue
            curr_indent = first_non_indent_char(s)
            if curr_indent < indent:
                break
            elif curr_indent == indent:
                i += 1
                yield title
            else:
                yield list(subgenerator(curr_indent))
    return list(subgenerator(-1))

>>> nestedbyindent(TXT)
['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']],
'NeedHelp2',['StillStuck', 'GoodLuck']]]

score 0 · Accepted Answer

这是非常非Pythonic和冗长的答案。但它似乎工作。

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

outString = '['
level = 0
first = 1
for i in TXT.split("\n")[1:]:
    count = 0
    for j in i:
        if j!=' ':
            break
        count += 1
    count /= 4 #4 space = 1 indent
    if i.lstrip()!='':
        itemStr = "'" + i.lstrip() + "'"
    else:
        itemStr = ''
    if level < count:
        if first:
            outString += '['*(count - level) + itemStr
            first = 0
        else:
            outString += ',' + '['*(count - level) + itemStr
    elif level > count:
        outString += ']'*(level - count) + ',' + itemStr
    else:
        if first:
            outString += itemStr
            first = False
        else:
            outString += ',' + itemStr
    level = count
if len(outString)>1:
    outString = outString[:-1] + ']'
else:
    outString = '[]'

output = eval(outString)
#['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']], 'NeedHelp2', ['StillStuck', 'GoodLuck']]]

score 0 · Accepted Answer

摆脱这个答案，如果要保留整行并且这些行不仅仅包含变量名称，t.type == NAME则可以用替换t.type == NEWLINE，并且 if 语句可以附加剥离的行而不是t.string. 像这样的东西：

from tokenize import NEWLINE, INDENT, DEDENT, tokenize

def parse(file):
    stack = [[]]
    lastindent = len(stack)

    def push_new_list():
        stack[-1].append([])
        stack.append(stack[-1][-1])
        return len(stack)

    for t in tokenize(file.readline):
        if t.type == NEWLINE:
            if lastindent != len(stack):
                stack.pop()
                lastindent = push_new_list()
            stack[-1].append(t.line.strip()) # add entire line to current list
        elif t.type == INDENT:
            lastindent = push_new_list()
        elif t.type == DEDENT:
            stack.pop()
    return stack[-1]

否则，行会在任何标记上拆分，其中标记包括空格、括号、方括号等。

parsing - 将嵌套的缩进文本解析为列表

4 回答 4

输出

Related

Reference