python - 如何在python中将文本格式与没有正则表达式的字符串匹配？

Question

我正在阅读一个文件，其中包含以下形式的行

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

我看到了Matlab 代码来读取这个文件

[I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%s p= %n e=%u n=%u')

我想用 Python 读取这个文件。我唯一知道的是正则表达式，甚至阅读这一行的一部分都会导致类似

re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

这是丑陋的！在 Python 中有没有更简单的方法来做到这一点？

score 4 · Accepted Answer

您可以通过使用转义/替换来构建正则表达式，使其更具可读性...

number = "([-+0-9.DdEe ]+)"
unit = r"\(([^)]+)\)"
t = "[X] L=XU R=XU p=X e=X n=X"
m = re.compile(re.escape(t).replace("X", number).replace("U", unit))

score 2 · Accepted Answer

这在我看来或多或少是 Pythonic：

line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"

parts = (None, int, None,
         None, int, str,
         None, int, str,
         None, float,
         None, int,
         None, int)

[I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None]

print [I,L,Ls,R,Rs,p,e,n]

score 1 · Accepted Answer

Python 没有像 Python 的 re page 中所述的 scanf 等效项。

Python 目前没有等效于 scanf() 的方法。正则表达式通常比 scanf() 格式字符串更强大，但也更冗长。下表提供了 scanf() 格式标记和正则表达式之间的一些或多或少的等效映射。

但是，您可能可以使用该页面上的映射构建自己的类似 scanf 的模块。

score 1 · Accepted Answer

Pyparsing is a fallback from unreadable and fragile regex processors. The parser example below handles your stated format, plus any variety of extra whitespace, and arbitrary order of the assignment expressions. Just as you have used named groups in your regex, pyparsing supports results names, so that you can access the parsed data using dict or attribute syntax (data['Lint'] or data.Lint).

from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional

# define basic punctuation
EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]")

# numeric values
integer = Word(nums).setParseAction(lambda t : int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0]))

# id and assignment fields
idRef = LBRACK + integer("id") + RBRACK
typesep = LPAR + oneOf("D G") + RPAR
lExpr = 'L' + EQ + integer("Lint")
rExpr = 'R' + EQ + integer("Rint")
pExpr = 'p' + EQ + real("pFloat")
eExpr = 'e' + EQ + integer("Eint")
nExpr = 'n' + EQ + integer("Nint")

# accept assignments in any order, with or without leading (D) or (G)
assignment = lExpr | rExpr | pExpr | eExpr | nExpr
line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment)


# test the parser
text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"
data = line.parseString(text)
print data.dump()


# prints
# [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34]
# - Eint: 10
# - Lint: 9
# - Nint: 34
# - Rint: 14
# - id: 0
# - pFloat: 0.0347222

Also, the parse actions do the string->int or string->float conversion at parse time, so that afterward the values are already in a usable form. (The thinking in pyparsing is that, while parsing these expressions, you know that a word composed of numeric digits - or Word(nums) - will safely convert to an int, so why not do the conversion right then, instead of just getting back matching strings and having to re-process the sequence of strings, trying to detect which ones are integers, floats, etc.?)

python - 如何在python中将文本格式与没有正则表达式的字符串匹配？

4 回答 4

Related

Reference