python - 比较列表的子项并在 Python 中进行更改

Question

我有两个来自词性标注器的列表，如下所示：

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]


pos_names = [('John', 'NNP'), ('Murphy', 'NNP')]

我想创建一个最终列表，它使用 pos_names 中的列表项更新 pos_tags。所以基本上我需要在 pos_tags 中找到 John 和 Murphy，并将 POS 标签替换为 NNP。

score 0 · Accepted Answer

我同意字典将是解决此问题的更自然的解决方案，但如果您需要您pos_tags的订单，更明确的解决方案将是：

for word, pos in pos_names:
    for i, (tagged_word, tagged_pos) in enumerate(pos_tags):
        if word == tagged_word:
            pos_tags[i] = (word,pos)

（对于大量单词，字典可能会更快，因此您可能需要考虑将单词顺序存储在列表中并使用字典进行 POS 分配）。

score 0 · Accepted Answer

您可以从中创建一个pos_names充当查找表的字典。然后，您可以使用get搜索表以查找可能的替换，如果未找到替换，则保留标签原样。

d = dict(pos_names)
pos_tags = [(word, d.get(word, tag)) for word, tag in pos_tags]

score 0 · Accepted Answer

给定

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]

和

names = ['John', 'Murphy']

你可以做：

[next((subl for subl in pos_tags if name in subl)) for name in names]

这会给你：

[('John', u'NNP'), ('Murphy', u'NNP')]

python - 比较列表的子项并在 Python 中进行更改

3 回答 3

Related

Reference