python - 在文本中搜索序列

Question

我遇到了一个逻辑问题。

我有一个声明如下的字符串：

fruits = "banana grapes apple"
vegetables = "potatoes cucumber carrot"

现在有一些文本句子，我必须搜索文本格式前面的单词<vegetables> <fruits>

I ate carrot grapes ice cream for dessert.

答：吃了

Dad and mom brought banana cucumber and milk.

答案：带来

我在想的是拆分句子并将其放入一个数组中，然后查找序列，我能够打破句子但匹配序列是一个问题。

wd = sentence.split(' ')
for x in wd.strip().split():
# now i will have to look for the sequence

现在，我将不得不寻找文本格式前面的文本

score 2 · Accepted Answer

您在这里使用了错误的数据结构，将水果和蔬菜转换为集合。那么问题就很容易解决了：

>>> fruits = set("banana grapes apple".split())
>>> vegetables = set("potatoes cucumber carrot".split())
>>> fruits_vegs = fruits | vegetables                  
>>> from string import punctuation
def solve(text):                                   
    spl = text.split()
    #use itertools.izip and iterators for memory efficiency.
    for x, y in zip(spl, spl[1:]): 
        #strip off punctuation marks
        x,y = x.translate(None, punctuation), y.translate(None, punctuation)
        if y in fruits_vegs and x not in fruits_vegs:
            return x
...         
>>> solve('I ate carrot grapes ice cream for dessert.')
'ate'
>>> solve('Dad and mom brought banana cucumber and milk.')
'brought'
>>> solve('banana cucumber and carrot.')
'and'

score 1 · Accepted Answer

fruits = "banana grapes apple".split(" ")
vegetables = "potatoes cucumber carrot".split(" ")

sentence = 'Dad and mom brought banana cucumber and milk.'

wd = sentence.split(' ')
for i, x in enumerate(wd):
    if (x in fruits or x in vegetables) and i > 0:
        print wd[i-1]
        break

score 1 · Accepted Answer

您可以使用正则表达式执行此操作：

def to_group(l):
    ''' make a regex group from a list of space-separated strings '''
    return '(?:%s)' % ('|'.join(l.split()))

pattern = r'(\w+) %s %s' % (to_group(vegetables), to_group(fruits))
print re.findall(pattern, string)

python - 在文本中搜索序列

3 回答 3

Related

Reference