0

我正在尝试创建一个非常基本的逆索引。IE,

函数inverseIndex应该输入逗号分隔的字符串,然后返回它们所属文档的索引集。例如:inverseIndex("Hi Dude","Dude","ok Dude")==>{"Hi":{0}, "Dude":{0,1,2},"ok",{3}}

因为我来自 Java - Javascript 背景,所以我在没有理解的情况下用 Python 编写了它:

def inverseIndex(strlist):

    strDict = {}
    listOfStrings =  list(enumerate(strlist))

    allKeyWords = set(sum([y.split() for (x,y) in listOfStrings],[]))

    strDict = {};
    for i in allKeyWords :
        setStr = set();
        for j in listOfStrings:
            if(j[1].find(i)):
                setStr.add(j[0])
        strDict[i] = setStr;

    return strDict  

这是我尝试过的东西:

strDict = {}
    listOfStrings =  list(enumerate(strlist))

    # get all the key words segregated in a set so we dont have duplicates.
    allKeyWords = set(sum([y.split() for (x, y) in listOfStrings], []))

    print(allKeyWords)

    return {x: y for x in allKeyWords for (y, z) in listOfStrings if z.find(x) != -1}

这看起来工作正常。但是,我无法使用理解来编写它。

itertools另外,如果有的话,我正在寻找一种不使用的方法。

4

1 回答 1

2

我认为这就是你要找的:

脚本:

strings = ["Hi Dude", "Dude", "ok Dude"]
dictionary = {}
for i, item in enumerate(strings):
    for word in item.split():
        try:
            dictionary[word].append(i)
        except KeyError:
            dictionary[word] = [i]

演示:

print(dictionary)
# {'Dude': [0, 1, 2], 'Hi': [0], 'ok': [2]}
于 2013-07-13T18:42:07.397 回答