2

我有一本同义词词典:

synonym = {"this": ["this", "same"],
           "all": ["all", "any", "*"],
           "alluptolastyear": ["alluptolastyear", "uptolastyear"],
           "dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"],
           "dekadafter": ["dekadafter", "nextdekad", "followingdekad"],
           "yearbefore": ["yearbefore", "lastyear", "formeryear"],
           "monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]}

每个数组存储同义词,通过键引用。我从一个 XML 文件中读取了两个字符串,并尝试比较它们。

例如:

  • "this"并且"same"相等(同义词)
  • '"lastyear"' 和 '"formeryear"' 相等(同义词)
  • "all"并且"nextdekad"是不同的
  • 当然,每个键值都在其对应的数组中找到,因此每个键都是其数组字符串的同义词。

有人可以帮助我使用同义词字典编写这些字符串的pythonic比较吗?

4

5 回答 5

6

尝试这个:

def are_sinonims(a, b):
    return a in synonym.get(b,[]) or b in synonym.get(a,[]) or any(a in synonym[k] and b in synonym[k] for k in synonym)

此外,我们可以将部分重写a in synonym[k] and b in synonym[k] for k in synonyma in words and b in words for words in synonym.values()

def are_sinonims(a, b):
    return a in synonym.get(b,[]) \
           or b in synonym.get(a,[]) \
           or any(a in words and b in words for words in synonym.values())
于 2012-10-12T08:22:52.390 回答
4

您可以将每个单词转换为“同义词哈希”(如果两个单词是同义词则相等,否则不同):

def sym_hash(word):
    for w, s in synonym.items():
        if word == w or word in s:
            return w
    return word

然后使用它们的“哈希”比较单词:

def phrases_equal(p1, p2):
    return all(sym_hash(a) == sym_hash(b) for a, b in zip(p1, p2))

p1 = "all your base this dekadbefore are formeryear".split()
p2 = "any your base same lastdekad are yearbefore".split()

print phrases_equal(p1, p2) # True

实际上,同义词数据库的正确数据结构似乎是集合列表,而不是字典:

synonym = [
    {"this", "same"},
    {"all", "any", "*"},
    {"alluptolastyear", "uptolastyear"},
    {"dekadbefore", "lastdekad", "formerdekad", "precedingdekad"},
    {"dekadafter", "nextdekad", "followingdekad"},
    {"yearbefore", "lastyear", "formeryear"},
    {"monthbefore", "lastmonth", "precedingmonth"}
]

sym_hash在这种情况下,我们可以更有效地编码为

def sym_hash(word):
    return next((s for s in synonym if word in s), word)
于 2012-10-12T11:03:45.987 回答
3

为什么不只是:

def are_sinonims(a, b):
    return b in synonym.get(a, []) or a in synonym.get(b, [])

有错误评论后编辑。

于 2012-10-12T08:19:26.437 回答
1

首先,为每个同义词创建新的字典作为键:

word_to_word = {}    
for syns in synonym.values():
    for word in syns:
        word_to_word[word] = syns

函数比较字符串:

def are_sinomic(a, b):    
    words_a, words_b = a.split(), b.split()
    if len(words_a) != len(words_b):
        return False
    for word_a, word_b in zip(words_a, words_b):
       if word_a != word_b and word_b not in word_to_word.get(word_a, []):
           return False
    return True
于 2012-10-12T08:22:28.977 回答
0

If you're only concerned that something is a synonym, then you can just build a set of 2-tuples from the permutations of the dict's values...:

synonym = {"this": ["this", "same"], 
           "all": ["all", "any", "*"], 
           "alluptolastyear": ["alluptolastyear", "uptolastyear"], 
           "dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"], 
           "dekadafter": ["dekadafter", "nextdekad", "followingdekad"], 
           "yearbefore": ["yearbefore", "lastyear", "formeryear"], 
           "monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]} 

from itertools import chain, permutations
synonym_set = set(chain.from_iterable(permutations(val, 2) for val in synonym.values()))

def are_synonyms(a, b):
    return (a, b) in synonym_set
于 2012-10-12T09:58:46.433 回答