python - Python从文本文件中提取不同长度的值

Question

我想将 .txt 文件的内容作为字符串加载并提取特定的信息。该信息在出现前后有很多文本，如下所示：

ValueName:     1234

但也可能看起来像：

ValueName:     123456

也就是说，该值始终是一串整数，但长度不同。

我想在字符串中找到“ValueName”，然后返回以 6 个字符开头的字符。我的想法是检查并查看'ValueName'之后以6个字符开头的10个字符是否为整数，如果是则按顺序返回它们。这可能吗？非常感谢。

score 3 · Accepted Answer

您可以使用正则表达式来提取以下值ValueName:

>>> import re
>>> line = 'some dummy text ValueName:     123456 some dummy text'
>>> m = re.findall(r'ValueName:\s+([0-9]+)',line)
>>> m
['123456']

如果存在，这将找到多个匹配项。

>>> import re
>>> line = 'blah blah ValueName: 1234 blah blah ValueName: 5678'
>>> m = re.findall(r'ValueName:\s+([0-9]+)',line)
>>> m
['1234', '5678']

score 3 · Accepted Answer

正如布赖恩的回答（以及其他人）所示，正则表达式将使这更简单。

但是，如果您不愿意了解它的作用，请不要使用正则表达式。如果您现在想推迟学习曲线，* 使用简单的字符串处理并不难：

def numeric_value_names(path):
    with open(path) as f:
        for line in f:
            bits = line.partition('ValueName:')
            if bits[1] and not bits[0]:
                rest = bits[2][6:].rstrip()
                if rest.isdigit():
                    yield rest

Using str.partition this way may be a bit obtuse to novices, so you may want to make the condition more obvious:

def numeric_value_names(path):
    with open(path) as f:
        for line in f:
            if line.startswith('ValueName:'):
                bits = line.partition('ValueName:')
                rest = bits[2][6:].rstrip()
                if rest.isdigit():
                    yield rest

* You definitely want to learn simple regular expressions at some point; the only question is whether you have something more pressing to do now…</p>

score 1 · Accepted Answer

import re

regex = re.compile(r'ValueName:\s*([0-9]+)')
with open(file, "r") as f:
    for line in f:
        match = re.search(regex, line)
        if match:
            result = int(match.group(1))
            break

score 1 · Accepted Answer

使用正则表达式

import re
for line in text
  re.search('^ValueName: (\d+)',line).group(1)

如果您需要对它们进行排序，那么您应该将它们放在一个列表中。

lst.append(re.search('^ValueName: (\d+)',line).group(1))

最后只是对列表进行排序

排序（lst）

接下来，我将向您展示一个完整的示例，以便您提取所需的内容

import re

text = ['ValueName: 33413','ValueName: 443234531','ValueName: 5243222','ValueName: 33']
lst = []

for line in text:
  lst.append(re.search('^ValueName: (\d+)',line).group(1))

lst = [int(x) for x in lst]
for x in sorted(lst):
  print(x)

score 0 · Accepted Answer

你可以这样做：

for line in open("file"):
    if "1234" in line:
    print line

来源： http ://ubuntuforums.org/showthread.php?t=820319

score -1 · Accepted Answer

使用正则表达式，您可以执行类似的操作

regex = re.compile("^(.*[0-9]{4,}.*)$")
for line in regex.findall(your_text_here):
    print line

给定正则表达式

 ^(.*[0-9]{4,}.*)$

将匹配中间某处至少有 4 个整数的所有行。

score -1 · Accepted Answer

你可以这样做

import re

re.findall(r'ValueName:\d\d\d',s)

如果 's' 是您的字符串变量（名称），而 \d 是您要查找的整数数。在您的情况下，它将是 \d\d\d\d\d\d ... 不完全漂亮，但它可以工作。

python - Python从文本文件中提取不同长度的值

7 回答 7

Related

Reference