python - 从python中的表中提取数据的正则表达式

Question

我是正则表达式和 python 的新手：我有一个数据存储在一个日志文件中，我需要使用正则表达式来提取它。以下是格式：

#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0         1000         0.01         0.03         0.02
   4         1000       177.69       177.88       177.79
   8         1000       175.90       176.07       176.01
  16         1000       181.51       181.73       181.60
  32         1000       199.64       199.81       199.72
  64         1000       228.10       228.27       228.19
  28         1000       278.70       278.90       278.75
  256         1000       388.26       388.49       388.39
  512         1000       593.49       593.82       593.63
  1024         1000      1044.27      1044.90      1044.59

score 3 · Accepted Answer

您可以使用split或正则表达式来获取特定列。对于这种情况，拆分更干净：

import re
with open("input") as input_file:
    for line in input_file:
        # using split to get the 4th column
        print line.split()[3]
        # using regex to get the 4th column
        print re.match(r'^\s*(?:[^\s]+[\s]+){3}([^\s]+)', line).group(1)

score 0 · Accepted Answer

您可以使用以下genfromtxt功能numpy：

>>> import numpy as np
>>> a = np.genfromtxt("yourlogfile.dat",skip_header=1)

a将是所有数据的数组。

score 0 · Accepted Answer

如果您需要使用正则表达式，则此脚本可以解决问题：

import re

number_pattern = '(\d+(?:\.\d+)?)'
line_pattern = '^\s+%s\s+$' % ('\s+'.join([number_pattern for x in range(5)]))

f = open('data', 'r')
for line in f:
  match = re.match(line_pattern, line)
  if match is not None:
    print match.groups()

score 0 · Accepted Answer

你只需要 (\S+)

import re
pattern=re.compile('(\S+)')
f=open('data.txt', 'r')
for l in f.readlines():
    print pattern.findall(l)

你也可以用另一种方式

import re
whitespace=re.compile('\s+')
    f=open('data.txt', 'r')
    for l in f.readlines():
        print whitespace.split(l.strip())

python - 从python中的表中提取数据的正则表达式

4 回答 4

Related

Reference