import re
with open("input") as f:
for line in f:
mo=re.match(r'[^\d]*(\d+).*?(tons|feet|lbs)', line)
if mo: print mo.group(1), mo.group(2)
输出
1023 lbs
1023 tons
1023 feet
此外,如果您有类似 的行$100 money is too much for 100 lbs
,则可以使用:
import re
with open("input") as f:
for line in f:
mo=re.match(r'.*?(?<![$\d])(\d+).*?(tons|feet|lbs)', line)
if mo: print mo.group(1), mo.group(2)
并匹配公斤,巨型的东西:
import re
with open("input") as f:
for line in f:
mo=re.match(r'.*?(\d+).*?(mega|kilo|metric|) (tons|feet|lbs)', line)
if mo: print mo.group(1), mo.group(2), mo.group(3)
输出
1023 mega lbs
1023 kilo tons
1023 feet
100 lbs
可以将这些单位和修饰符存储在列表中,并将它们加入其中|
以动态创建正则表达式。
匹配所有可能的单位修饰符的示例:
import re
with open("input") as f:
for line in f:
mo=re.match(r'[^\d]*(\d+).*?(\S*)\s*(tons|feet|lbs)', line)
if mo: print "'{}' '{}' '{}'".format(mo.group(1), mo.group(2),
mo.group(3))
输出
'1023' 'mega' 'lbs'
'1023' 'kilo' 'tons'
'1023' '' 'feet'