You can't access the set as it is being built in a comprehension. But you can bound the input using itertools.islice()
.
import sys
from itertools import islice
values = {x.strip() for x in islice(open(sys.argv[1], 'r'), 100)}
This does limit the size, but since sets do not allow duplicates, the resulting set may be less than 100, even if there are more than 100 values, unlike your for loop, which would stop at a set size of exactly 101.
That will limit the number of values input, but I want to limit the number of unique values stored.
Limiting the inputs does bound the size of the output. But if you want the exact behavior of your example in fewer lines, here you go.
import sys
values = set()
any(values.add(x.strip()) and len(values) > 100 for x in open(sys.argv[1]))
print(values)
The any()
builtin will pull from the generator expression until it either exhausts it or finds a true value, whichever comes first. The expression values.add(x.strip())
will always return None
, so the generator returns True
only when len(values) > 100
.
While this does compress your entire for loop down into one line, it is arguably no simpler. The question then is, which do you find more readable?
Thanks for the additional example. It accomplishes the original goal of limiting the size of the set, but as you pointed out, it's really just another way to write the loop that I originally posted. I was hoping that there might be a way to do it inside of a set comprehension to get the benefits that come from that.
What benefits are those? You certainly could do this all in a comprehension, but there's really no point.
{x for values in [set()]
if any(values.add(x.strip())
and len(values) > 100
for x in open(sys.argv[1]))
or True
for x in values()}
Like I said, you can't access the set as it is being built in a comprehension, but that doesn't stop you from using a different set. But this does the pointless extra step of copying the set.