python - 限制输入的集合理解的大小

Question

解析输入时有没有办法限制集合理解的大小。这是一个简单的例子：

import sys

values = {x.strip() for x in open(sys.argv[1], 'r')}

print(values)

的大小values是无限的，但有没有办法限制它？可以使用下面的 for 循环来完成，但有更简单的方法吗？

import sys

values = set()
for x in open(sys.argv[1], 'r'):
    values.add(x.strip())

    if len(values) > 100:
       break

print(values)

score 2 · Accepted Answer

You can't access the set as it is being built in a comprehension. But you can bound the input using itertools.islice().

import sys
from itertools import islice

values = {x.strip() for x in islice(open(sys.argv[1], 'r'), 100)}

This does limit the size, but since sets do not allow duplicates, the resulting set may be less than 100, even if there are more than 100 values, unlike your for loop, which would stop at a set size of exactly 101.

That will limit the number of values input, but I want to limit the number of unique values stored.

Limiting the inputs does bound the size of the output. But if you want the exact behavior of your example in fewer lines, here you go.

import sys

values = set()
any(values.add(x.strip()) and len(values) > 100 for x in open(sys.argv[1]))

print(values)

The any() builtin will pull from the generator expression until it either exhausts it or finds a true value, whichever comes first. The expression values.add(x.strip()) will always return None, so the generator returns True only when len(values) > 100.

While this does compress your entire for loop down into one line, it is arguably no simpler. The question then is, which do you find more readable?

Thanks for the additional example. It accomplishes the original goal of limiting the size of the set, but as you pointed out, it's really just another way to write the loop that I originally posted. I was hoping that there might be a way to do it inside of a set comprehension to get the benefits that come from that.

What benefits are those? You certainly could do this all in a comprehension, but there's really no point.

{x for values in [set()]
 if any(values.add(x.strip())
        and len(values) > 100
        for x in open(sys.argv[1]))
    or True
 for x in values()}

Like I said, you can't access the set as it is being built in a comprehension, but that doesn't stop you from using a different set. But this does the pointless extra step of copying the set.

python - 限制输入的集合理解的大小

1 回答 1

Related

Reference