python - 如何提取由方括号分隔的子字符串并生成子字符串

Question

在识别出与方括号内包含的模式匹配的子字符串后，我想提取并构造一些字符串：

例如：如果我的文字是“2 杯 [9 盎司] [10 克] 面粉”

我想从这个输入中生成 4 个字符串：

“2杯”->我们

“9盎司”->英国帝国

“10 克”-> 公制

“面粉” -> 成分名称

作为开始，我已经开始识别任何包含 oz 关键字的方括号，并编写了以下代码，但没有发生匹配。有什么想法和最佳实践可以做到这一点？

    p_oz = re.compile(r'\[(.+) oz\]', re.IGNORECASE) # to match uk metric
    text = '2 cups [9 oz] flour'

    m = p_oz.match(text)

    if m:
        found = m.group(1)
        print found

score 4 · Accepted Answer

您需要使用search而不是match.

m = p_oz.search(text)

re.match尝试将整个输入字符串与正则表达式进行匹配。那不是你想要的。您想找到与您的正则表达式匹配的子字符串，这就是re.search目的。

score 1 · Accepted Answer

我只是在扩展 BrenBarn 接受的答案。我喜欢在午餐时解决一个好问题。以下是我对您问题的完整实施：

给定字符串2 cups [9 oz] [10 g] flour

import re

text = '2 cups [9 oz] [10 g] flour' 

units = {'oz': 'uk imperical', 
         'cups': 'us', 
         'g': 'metric'}

# strip out brackets & trim white space
text = text.replace('[', '').replace(']', '').strip()

# replace numbers like 9 to "9
text = re.sub(r'(\d+)', r'"\1', text)

# expand units like `cups` to `cups" -> us`
for unit in units:
    text = text.replace(unit, unit + '" -> ' + units[unit] + "~")

# matches the last word in the string
text = re.sub(r'(\w+$)', r'"\1" -> ingredient name', text)

print "raw text: \n" + text + "\n"
print "Array:"
print text.split('~ ')

将返回一个字符串数组：

raw text:
"2 cups" -> us~ "9 oz" -> uk imperical~ "10 g" -> metric~ "flour" -> ingredient name

Array: [
 '"2 cups" -> us', 
 '"9 oz" -> uk imperical', 
 '"10 g" -> metric', 
 '"flour" -> ingredientname'
]

python - 如何提取由方括号分隔的子字符串并生成子字符串

2 回答 2

Related

Reference