python - python中文本分析器代码的时间复杂度

Question

letterList = ["a", 0, "b", 0, "c", 0, "d", 0, "e", 0, "f", 0, "g", 0, "h", 0, "i", 0,  "j", 0, "k", 0, "l", 0, "m", 0, "n", 0, "o", 0, "p", 0, "q", 0, "r", 0, "s", 0, "t", 0, "u", 0, "v", 0, "w", 0, "x", 0, "y", 0, "z", 0]
letterCount = 0
wordList = [None]
wordCount = 0
Count = 0
wordIndex = [0]
itext = cleaner(raw_input("enter itext please")).split()
print itext
for iword in itext:
    if iword in wordList:
        Count += 1
        for word in wordList:
            if iword == word:
                wordList[wordList.index(word)+1][0] += 1
                wordList[wordList.index(word)+1] += [wordCount]
            else:
                pass
    elif iword not in wordList:
        wordList += [iword]
        wordList += [[1, itext.index(iword)]]
    else:
        pass
    wordCount += 1
print wordList

代码看起来很乱，因为我是 python 和编程的初学者。

谁能帮我处理代码的时间复杂性？

score 6 · Accepted Answer

除了格式不同之外，后面的所有内容print itext都可以替换为：

print collections.Counter(itext)

这具有复杂性 O(n)。

如果没有 Counter，您可以使用 dict 而不是列表来更好地表达您的算法来存储字数：

word_counter = {}
for word in itext:
    if word in word_counter:
        word_counter[word] += 1
    else:
        word_counter[word] = 1

dict 非常适合存储某事物（此处为单词）和其他事物（此处为计数）之间的关联。与字典相比，交替的单词和计数对列表有很多缺点，但杀手是在列表中找到一个单词是 O(N) 而不是 O(1)。

score 0 · Accepted Answer

第二个循环没用。iword你只需要in的索引wordlist。

for iword in itext:
    if iword in wordList:
        i = wordList.index(iword)
        wordList[i+1][0] += 1
        wordList[i+1].append(wordCount)
        Count +=1
    else:
        wordList.append(iword)
        wordList.append([1, itext.index(iword)])
    wordCount += 1
print wordList

这会产生与您的代码相同的输出，但老实说，我还不清楚它是否真的是您所期望的......

score 0 · Accepted Answer

假设使用循环计数来计算复杂性。

首先，我们需要找到最坏的情况。第一次循环的复杂度为 1，第二次循环的最大复杂度为 3。在第三次中，如果第二次循环的复杂度为 3，那么第三次循环的最大复杂度也为 3，总复杂度为 7。如果你给第二个复杂度 1，你可以给第三个最大复杂度 5，但这仍然给你 7 的总复杂度。然而，在这一点上它变得很奇怪。第四次的最大复杂度是如果您的复杂度为 1、1、5 和 5，总共 12。5 有 1、1、1、7 和 7 = 17。6 有 1、1 , 1, 7, 7, and 7 = 24. 1, 1, 1, 1, 9, 9, 9 = 31. 1, 1, 1, 1, 9, 9, 9, 9 = 40. 寻找一般最差案例场景真的有点难，但我想说最坏的情况是当字符串的前半部分（无论你使用一半的地板还是天花板都无关紧要）由新单词组成，而字符串的其余部分由最后添加的新词。“红绿蓝黄黄黄黄”是 7 个单词的最坏情况示例。把它换成数学术语看起来有点像这样：

O(ceiling(n/2) + floor(n/2)*(ceiling(n/2)*2+1))

或者，python 计算给定列表大小的复杂性：

from __future__ import division
import math
def complexity(n):
    return math.ceil(n/2) + math.floor(n/2)*(math.ceil(n/2)*2+1)

也就是说，您的算法是可怕的，您真的应该用其他答案给出的算法之一替换它。

python - python中文本分析器代码的时间复杂度

3 回答 3

Related

Reference