0

我不明白这个函数返回两个变量的意义,它们是相同的:

def construct_shingles(doc,k,h):
    #print 'antes -> ',doc,len(doc)
    doc = doc.lower()
    doc = ''.join(doc.split(' '))
    #print 'depois -> ',doc,len(doc)
    shingles = {}
    for i in xrange(len(doc)):
        substr = ''.join(doc[i:i+k])
        if len(substr) == k and substr not in shingles:
            shingles[substr] = 1

    if not h:
        return doc,shingles.keys()

    ret = tuple(shingles_hashed(shingles))

    return ret,ret

似乎多余,但必须有充分的理由,我只是不明白为什么。也许是因为有两个返回语句?如果 'h' 为真,它是否返回两个返回语句?调用函数如下所示:

def construct_set_shingles(docs,k,h=False):
    shingles = []
    for i in xrange(len(docs)):
        doc = docs[i]
        doc,sh = construct_shingles(doc,k,h)
        docs[i] = doc
        shingles.append(sh)
    return docs,shingles

def shingles_hashed(shingles):
    global len_buckets
    global hash_table
    shingles_hashed = []
    for substr in shingles:
        key = hash(substr)
        shingles_hashed.append(key)
        hash_table[key].append(substr)
    return shingles_hashed

数据集和函数调用如下所示:

k = 3 #number of shingles

d0 = "i know you"
d1 = "i think i met you"
d2 = "i did that"
d3 = "i did it"
d4 = "she says she knows you"
d5 = "know you personally"
d6 = "i think i know you"
d7 = "i know you personally"

docs = [d0,d1,d2,d3,d4,d5,d6,d7]
docsChange,shingles = construct_set_shingles(docs[:],k)

github位置:lsh/LHS

4

1 回答 1

2

您的猜测是正确的,关于为什么return ret,ret,答案是 return 语句旨在返回一对相等的值而不是一个。

它更像是一种编码风格而不是算法,因为这可以通过其他语法来完成。然而,这在某些情况下是有利的,例如,如果我们写

def func(x, y, z):
    ...
    return ret

a = func(x, y, z)
b = func(x, y, z)

然后func将被执行两次。但如果:

def func(x, y, z):
    ...
    return ret, ret

a, b = func(x, y, z)

thenfunc只能执行一次,同时能够同时返回ab

同样在您的特定情况下:

如果hfalse则程序 until 执行直到行return doc,shingles.keys(),然后变量docshinconstruct_set_shingles分别取doc和的值shingles.keys()

否则,省略第一个return语句,执行第二个return语句,然后两者docsh相等的值,特别是等于tuple(shingles_hashed(shingles))

于 2018-11-16T04:58:50.380 回答