让我们清理一下:
def match_share(string, W, weights, rel_weight):
words = string.split()
words_counts = Counter(words)
words = string.split()
words_counts = Counter(words)
那是多余的!将 4 条语句替换为 2 条:
def match_share(string, W, weights, rel_weight):
words = string.split()
words_counts = Counter(words)
下一个:
ratios = []
for word in words:
if ((word in weights[W].keys())&(word in rel_weight[W].keys())):
if (weights[W][word]!=0):
ratios.append(words_counts[word]*rel_weight[W][word]/weights[W][word])
else:
ratios.append(0)
我不知道您认为该代码会做什么。我希望你没有狡猾。但.keys
返回一个可迭代的,并且X in <iterable>
比X in <dict>
. 另外,请注意:weights[W][word] != 0
如果最里面的 ( ) 条件失败,则不要附加任何内容。这可能是一个错误,因为您尝试在另一个 else 条件中附加 0。(我不知道你在做什么,所以我只是指出来。)这是 Python,而不是 Perl、C 或 Java。所以周围不需要括号if <test>:
让我们开始吧:
ratios = []
for word in words:
if word in weights[W] and word in rel_weight[W]:
if weights[W][word] != 0:
ratios.append(words_counts[word] * rel_weight[W][word] / weights[W][word])
else:
ratios.append(0)
下一个:
if len(words)>0:
ratios = np.divide(ratios, float(len(words)))
你试图防止被零除。但是您可以使用列表的真实性来检查这一点,并避免比较:
if words:
ratios = np.divide(ratios, float(len(words)))
其余的都很好,但你不需要变量。
ratio = np.sum(ratios)
return ratio
应用这些模块后,您的函数如下所示:
def match_share(string, W, weights, rel_weight):
words = string.split()
words_counts = Counter(words)
ratios = []
for word in words:
if word in weights[W] and word in rel_weight[W]:
if weights[W][word] != 0:
ratios.append(words_counts[word] * rel_weight[W][word] / weights[W][word])
else:
ratios.append(0)
if words:
ratios = np.divide(ratios, float(len(words)))
ratio = np.sum(ratios)
return ratio
仔细看一下,我看到你正在这样做:
word_counts = Counter(words)
for word in words:
append( word_counts[word] * ...)
根据我的说法,这意味着如果“apple”出现 6 次,您将在列表中附加 6*...,每个单词一次。因此,您的列表中将出现 6 次不同的 6*...。你确定那是你想要的吗?还是应该for word in word_counts
只迭代不同的单词?
另一个优化是从循环内部删除查找。即使 的值从不改变,您仍会继续查找weights[W]
和。让我们在循环之外缓存这些值。另外,让我们缓存一个指向该方法的指针。rel_weight[W]
W
ratios.append
def match_share(string, W, weights, rel_weight):
words = string.split()
words_counts = Counter(words)
ratios = []
# Cache these values for speed in loop
ratios_append = ratios.append
weights_W = weights[W]
rel_W = rel_weight[W]
for word in words:
if word in weights_W and word in rel_W:
if weights_W[word] != 0:
ratios_append(words_counts[word] * rel_W[word] / weights_W[word])
else:
ratios_append(0)
if words:
ratios = np.divide(ratios, float(len(words)))
ratio = np.sum(ratios)
return ratio
试试看,看看它是如何工作的。请查看上面的粗体注释和问题。可能存在错误,可能有更多方法可以加快速度。