1

我是 Python 新手,对列表和元组有一些疑问。我有一个由带有句子和词类标签的元组组成的列表。这是我列表中的一个元素:

[('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')]

如您所见,每个单词都有一个 wordclass-tag。如何在我的列表中搜索 word + wordclass?前任 如果我想查看 about 元素是否包含附加到 wordclass-tag "JJ" 的单词 "federal" ?

非常感谢帮助

4

3 回答 3

2

我会用一套来代替。然后您可以in有效地使用运算符:

wlist = set([('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')])

print ('federal', 'JJ') in wlist # prints True
于 2013-02-18T18:34:14.560 回答
1

要检查您的列表中是否有带有“JJ”标签的“联邦”一词:

your_list = [('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')]
print ('federal', 'JJ') in your_list

使用列表理解语法,您可以对列表做更多有趣的事情,例如查看一个单词所有出现的所有标签:

print " ".join([wordclass for word, wordclass in your_list if word == 'federal'])

最好构建一些对您使用的数据结构进行通用操作的函数,例如检查它是否包含单词或标签:

def hasWord(l, word):
    for w, wordclass in l:
        if w == word:
            return True
    return False

def hasTag(l, tag):
    for w, wordclass in l:
        if wordclass == tag:
            return True
    return False

if hasTag(your_list, 'JJ'): print your_list

要在评论中回答您的问题:

for sentence in sentences:
    if ('federal', 'JJ') in sentence:
        print sentence
于 2013-02-18T18:38:04.893 回答
0

我的第一种方法是:

def find_tuple(input, l):
    for (e1, e2) in l:
        if e1==input[0] and e2==input[1]:
            return True
    return False

它是直截了当但静态的,仅适用于您的问题。更一般但平等的方法:

def my_any(iterable, input, func):
    for element in iterable:
        if func(element, input):
            return True
    return False

input = ("federal","JJ")
l = [("It", "PPS"),("federal","JJ")]
print(my_any(l, input, lambda x, y: x[0]==y[0] and x[1]==y[1]))

传入一个 lambda 函数来自己决定你喜欢什么布尔匹配。一个简单的方法是:

input = ("federal","JJ")
l = [("It", "PPS"),("federal","JJ")]
if input in l:
    print("True")

如果您想更具体地解决您想解决的问题,那么提供具体建议会更容易。(即:您的返回类型是什么:布尔/字符串/元组..?)希望这会有所帮助。

干杯!

于 2013-02-18T19:01:58.997 回答