python - 使用正则表达式读取和处理文件

Question

我有一个庞大的文件，它只不过是这些块的重复单元：

//WAYNE ROONEY (wr10)
  90 [label="90"];
  90 -> 11 [weight=25];
  90 -> 21 [weight=23];
  90 -> 31 [weight=17];
  90 -> 41 [weight=12];
  90 -> 51 [weight=1];
  90 -> 62 [weight=50];
  90 -> 72 [weight=7];
  90 -> 82 [weight=27];
  90 -> 92 [weight=9];
  90 -> 102 [weight=43];

我需要转换成看起来像这样的格式

90 11 25

即我只需要删除所有额外的东西，只需保持数字不变。

我尝试使用正则表达式，这行代码：

for line in filein:
    match = re.search('label=" "', line)
    if match:
        print (match.group())

但它只是打印文件中的所有实例'label'。如果我尝试搜索'label=" "'，则没有输出。如果我能知道如何阅读标签，那么阅读权重将与它非常相似。

score 4 · Accepted Answer

这个怎么样：

import re

file = open("file","r")                       

for line in file:                                 
    if re.search('->',line):
        print ' '.join(re.findall('[0-9]+',line))

输出：

只需重定向以保存输出：python test.py > newfile

score 2 · Accepted Answer

您可以将所有行与以下内容匹配：

(\d+)-> 一个数字（反向引用）
\s*->\s*-> 空间 -> 空间
(\d+)-> 另一个数字（反向引用）
\s*\[weight=\"-> 空格和字面量 [weigth="
(\d+)-> 另一个数字（反向引用）
\];-> 字面量 ]; 结束比赛。

然后你有一个像这样编号的反向引用：

第一个数字
第二个数字
第三个数字

现在您可以使用您想要的模式构建您的字符串。（1 美元 2 美元 3 美元）

score 1 · Accepted Answer

要从每一行获取所有数字，请r'\d+'与一起使用.findall()：

for line in filein:
    if 'label' in line:
        print 'label:',
    print ' '.join(re.findall(r'\d', line))

不完全清楚你想对这些label行做什么，但非常简单的循环会打印出来：

label: 90 90
90 11 25
90 21 23
90 31 17

等等

python - 使用正则表达式读取和处理文件

3 回答 3

Related

Reference