python - 如何在其他列表中进行比较和插入的有效循环

Question

我正在开发一个处理大型数据库的 Web 应用程序（Python/Django），我需要优化这个循环以获得更好的执行时间。

我有一个条目列表，每个条目都有一个 yes_count 属性、一个 no_count 属性和一个 tid 属性。

我需要根据比率创建两个新列表 = yes_count / (yes_count + no_count)

使用内置函数（或更快的方法）是更好的方法吗？

yes_entries = []
no_entries = []

for e in entries:
    if e.tid in tids:
        if e.yes_count > 0 or e.no_count > 0:
            ratio = e.yes_count / (e.yes_count + e.no_count)
            if ratio > 0.75:
                yes_entries.append(e.tid)
            elif ratio < 0.25:
                no_entries.append(e.tid)

score 1 · Accepted Answer

我建议tids为 O(1) 摊销查找速度设置一组（而不是列表的 O(N)）：

set_tids = set(tids)

在for循环之前，然后

if e.tid in set_tids

否则，您提供的其余代码看起来非常优化

score 0 · Accepted Answer

您还可以通过仅访问e.tid,e.yes_count和e.no_count一次来节省一些时间，并将它们存储在变量中：

for e in entries:
    tid = e.tid
    if tid in tids:
        yes_count = e.yes_count
        no_count = e.no_count
        if yes_count > 0 or no_count > 0:
            ratio = yes_count / (yes_count + no_count)
            if ratio > 0.75:
                yes_entries.append(tid)
            elif ratio < 0.25:
                no_entries.append(tid)

您还可以通过缓存 no_entries.append 和 yes_entries.append 来节省时间：

yes_entries_append = yes_entries.append
no_entries_append = no_entries.append

for e in entries:
    tid = e.tid
    if tid in tids:
        yes_count = e.yes_count
        no_count = e.no_count
        if yes_count > 0 or no_count > 0:
            ratio = yes_count / (yes_count + no_count)
            if ratio > 0.75:
                yes_entries_append(tid)
            elif ratio < 0.25:
                no_entries_append(tid)

但到那时，你可能开始变得愚蠢了。

另一个可能更愚蠢的尝试是看看使用过滤器是否更快。在 python2 中， filter 返回一个列表，这意味着您要对其进行两次迭代，这不太理想。但是，我们有 itertools 来帮助我们：

def filterfunc(e):
    return (e.tid in tids) and (yes_count > 0 or no_count > 0)

for e in itertools.ifilter(filterfunc, entries):
    tid = e.tid
    yes_count = e.yes_count
    no_count = e.no_count
    ratio = yes_count / (yes_count + no_count)
    if ratio > 0.75:
        yes_entries_append(tid)
    elif ratio < 0.25:
        no_entries_append(tid)

下一个问题是我们再次访问 e 上的字段两次。让我们用一些迭代器魔法来解决这个问题：

def filterfunc(t):
    tid, yes_count, no_count = t
    return (tid in tids) and (yes_count > 0 or no_count > 0)

for tid, yes_count, no_count in itertools.ifilter(filterfunc, itertools.imap(attrgetter(["tid", "yes_count", "no_count"]), entries)):
    ratio = yes_count / (yes_count + no_count)
    if ratio > 0.75:
        yes_entries_append(tid)
    elif ratio < 0.25:
        no_entries_append(tid)

由您和您的分析器从我建议的所有选项中确定最佳方法。

此外，如果您使用的是 python3，请使用filter而不是itertools.ifilter，因为它返回生成器而不是 python2 的版本列表。

score 0 · Accepted Answer

注意：以下是一种更紧凑的解决方案的尝试，不一定更有效。一些分析可能是有序的。

我假设您正在检查，(e.yes_count > 0 or e.no_count > 0)所以您最终不会被零除。假设这是一个非常罕见的情况，我会简单地将比率计算包装为一个处理ZeroDivisonError异常的函数。在这种情况下，我们为该边缘情况返回零。

def get_ratio(y, n):
    try:
        return y / (y + n)
    except ZeroDivisionError:
        return 0

接下来，我们创建一个生成器，它遍历条目并返回候选值的比率和 tid。

tidset = set(tids)  # assuming tids is not yet a set()
ratios = ((get_ratio(e.yes_count, e.no_count), e.tid) 
            for e in entries if e.tid in tidset)

最后，我们遍历生成器并将它们附加到适当的列表中：

yes_entries, no_entries = [], []
for ratio, tid in ratios:
    (yes_entries, no_entries)[ratio < 0.75].append(tid)

python - 如何在其他列表中进行比较和插入的有效循环

3 回答 3

Related

Reference