python - 如何提取python文件中一行的确切缩进？

Question

我的目标是确定 python 文件中代码行的确切缩进。由于我将在某个位置检测语句，因此确定行所需的缩进对于实现我的目标很重要。该问题可以在以下示例中进行解释：

First Scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
    print a          <----------- indentation '4' spaces or '1' \t
    a=a+1            <----------- indentation '4' spaces or '1' \t

Second scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
        print a      <----------- indentation '8' spaces or '2' \t
        a=a+1        <----------- indentation '8' spaces or '2' \t

由于我正在检查由许多文件组成的应用程序，因此我遇到了具有上述情况的文件。我想知道如何确定 python 文件中任何行的缩进？

score 4 · Accepted Answer

请注意，您选择确定缩进的方法会对性能产生重大影响。例如，虽然您可以使用正则表达式来测量前导空格，但有更简单且更有效的方法来执行此操作。

import re

line = '            Then the result is even.'
r = re.compile(r"^ *")

%timeit len(line) - len(line.lstrip())    # 1000000 loops, best of 3: 0.387 µs per loop
%timeit len(re.findall(r"^ *", line)[0])  #  100000 loops, best of 3: 1.94 µs per loop
%timeit len(r.findall(line)[0])           # 1000000 loops, best of 3: 0.890 µs per loop

其他答案中的正则表达式较慢的原因是正则表达式是一个状态机，在构造正则表达式时编译。内部有一个缓存，但即便如此，最好自己手动编译和重用正则表达式。

然而，请注意，正则表达式解决方案的速度仅比第一个样本快 20%（最坏的情况；如果使用预编译的表达式，则为 43%），该样本在去除空格之前和之后比较字符串。

重要提示： Python 将制表符解释为 8 空格缩进，因此您还需要.replace()在评估之前使用等量空间的文字制表符。

编辑添加：Python 解析器本身并不关心特定的缩进级别，只关心给定的“块”始终缩进。缩进的增加量被有效地忽略和剥离，取而代之的是 INDENT 和 DEDENT 标记。（缩进 16 个空格 → 只有一个 INDENT 记号。）真正重要的是缩进的逐行变化。

score 0 · Accepted Answer

来自“学习Python ”：

Python 不关心你如何缩进（你可以使用空格或制表符），或者你缩进多少（你可以使用任意数量的空格或制表符）。事实上，一个嵌套块的缩进可以与另一个嵌套块的缩进完全不同。语法规则只是对于给定的单个嵌套块，其所有语句必须向右缩进相同的距离。如果不是这种情况，您将收到语法错误

这意味着，据我了解，如果两行左侧的空白字符（字符串或制表符）的序列相同，则两行具有相同的缩进级别。

如果您查看文本编辑器，这可能会使事情变得混乱，因为制表符根据制表位以不同的宽度呈现，因此看起来相同的东西实际上可能并不相同。从这个意义上说，即使是的概念indented the same DISTANCE to the right本身也是有问题的，因为从视觉上讲，“距离”将取决于每个编辑器用于呈现给定空白字符的约定。

score 0 · Accepted Answer

关于什么

line = '    \t  asdf'
len(re.split('\w', line)[0].replace('\t', '    '))
>>> 10

请注意，其他建议的解决方案都不会正确计算标签。

score 0 · Accepted Answer

您可以使用正则表达式：

import re
with open("/path/to/file") as file:
    for mark, line in enumerate(file.readlines()):
        print mark, len(re.findall("^ *", line)[0])

第一个数字是行号，第二个是缩进。

或者，如果您想要特定的行，请执行以下操作：

import re
with open("/path/to/file") as file:
    print len(re.findall("^ *", file.readlines()[3])[0])

这将返回第 4 行的缩进（记住索引将是您想要的行号 -1）。

score 0 · Accepted Answer

“我对其他技术知之甚少”的方法。

read = open('stringstuff.py','rb')
indent_space = []
for line in read:
    spaces = 0
    for char in line:
        if char != " ":
            break
        spaces += 1
    indent_space.append(spaces)


for i in xrange(len(indent_space)-1):
    new_indentation = abs(indent_space[i+1] - indent_space[i-1])
    if new_indentation != 0:
        indentation = new_indentation
        if new_indentation != indentation:
            print 'Indentation:', new_indentation, "found"
            indentation = new_indentation

for line in indent_space:
    print "Indentation of", line, "spaces or", line/indentation, "indents."

python - 如何提取python文件中一行的确切缩进？

5 回答 5

Related

Reference