python - 如何在 Python 中实现一个高效的无限质数生成器？

Question

这不是作业，我只是好奇。

INFINITE 是这里的关键词。

我希望将其用作for p in primes(). 我相信这是 Haskell 中的一个内置函数。

所以，答案不能像“做个筛子”那样天真。

首先，你不知道会消耗多少个连续素数。好吧，假设你一次可以炮制 100 个。您会使用相同的筛法以及素数频率公式吗？

我更喜欢非并发方法。

感谢您阅读（和写作；））！

score 81 · Accepted Answer

“If I have seen further…”</h1>
The `erat2` function from the cookbook can be further sped up (by about 20-25%):

erat2a

import itertools as it
def erat2a( ):
    D = {  }
    yield 2
    for q in it.islice(it.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            # old code here:
            # x = p + q
            # while x in D or not (x&1):
            #     x += p
            # changed into:
            x = q + 2*p
            while x in D:
                x += 2*p
            D[x] = p

The not (x&1) check verifies that x is odd. However, since both q and p are odd, by adding 2*p half of the steps are avoided along with the test for oddity.

erat3

If one doesn't mind a little extra fanciness, erat2 can be sped up by 35-40% with the following changes (NB: needs Python 2.7+ or Python 3+ because of the itertools.compress function):

import itertools as it
def erat3( ):
    D = { 9: 3, 25: 5 }
    yield 2
    yield 3
    yield 5
    MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
    MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

    for q in it.compress(
            it.islice(it.count(7), 0, None, 2),
            it.cycle(MASK)):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = q + 2*p
            while x in D or (x%30) not in MODULOS:
                x += 2*p
            D[x] = p

The erat3 function takes advantage of the fact that all primes (except for 2, 3, 5) modulo 30 result to only eight numbers: the ones included in the MODULOS frozenset. Thus, after yielding the initial three primes, we start from 7 and work only with the candidates.
The candidate filtering uses the itertools.compress function; the “magic” is in the MASK sequence; MASK has 15 elements (there are 15 odd numbers in every 30 numbers, as chosen by the itertools.islice function) with a 1 for every possible candidate, starting from 7. The cycle repeats as specified by the itertools.cycle function.
The introduction of the candidate filtering needs another modification: the or (x%30) not in MODULOS check. The erat2 algorithm processed all odd numbers; now that the erat3 algorithm processes only r30 candidates, we need to make sure that all D.keys() can only be such —false— candidates.

Benchmarks

Results

On an Atom 330 Ubuntu 9.10 server, versions 2.6.4 and 3.1.1+:

$ testit
up to 8192
==== python2 erat2 ====
100 loops, best of 3: 18.6 msec per loop
==== python2 erat2a ====
100 loops, best of 3: 14.5 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
100 loops, best of 3: 19.2 msec per loop
==== python3 erat2a ====
100 loops, best of 3: 14.1 msec per loop
==== python3 erat3 ====
100 loops, best of 3: 11.7 msec per loop

On an AMD Geode LX Gentoo home server, Python 2.6.5 and 3.1.2:

$ testit
up to 8192
==== python2 erat2 ====
10 loops, best of 3: 104 msec per loop
==== python2 erat2a ====
10 loops, best of 3: 81 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
10 loops, best of 3: 116 msec per loop
==== python3 erat2a ====
10 loops, best of 3: 82 msec per loop
==== python3 erat3 ====
10 loops, best of 3: 66 msec per loop

Benchmark code

A primegen.py module contains the erat2, erat2a and erat3 functions. Here follows the testing script:

#!/bin/sh
max_num=${1:-8192}
echo up to $max_num
for python_version in python2 python3
do
    for function in erat2 erat2a erat3
    do
        echo "==== $python_version $function ===="
        $python_version -O -m timeit -c \
        -s  "import itertools as it, functools as ft, operator as op, primegen; cmp= ft.partial(op.ge, $max_num)" \
            "next(it.dropwhile(cmp, primegen.$function()))"
    done
done

score 78 · Accepted Answer

由于 OP 要求有效的实现，这是 David Eppstein/Alex Martelli 对活动状态 2002 代码的重大改进（在他的回答中看到）：不要在字典中记录素数的信息，直到它的正方形出现在候选人。对于产生的 n 个素数 ( π(sqrt(n log n)) ~ 2 sqrt(n log n) / log(n log n) ~将空间复杂度降低到O(sqrt(n))而不是O(n)以下2 sqrt（n / log n））。因此，时间复杂度也得到了提高，即它运行得更快。

创建一个“滑动筛”作为每个基本素数的当前倍数（即低于当前生产点的 sqrt）的字典，以及它们的步长值：

from itertools import count
                                         # ideone.com/aVndFM
def postponed_sieve():                   # postponed sieve, by Will Ness      
    yield 2; yield 3; yield 5; yield 7;  # original code David Eppstein, 
    sieve = {}                           #   Alex Martelli, ActiveState Recipe 2002
    ps = postponed_sieve()               # a separate base Primes Supply:
    p = next(ps) and next(ps)            # (3) a Prime to add to dict
    q = p*p                              # (9) its sQuare 
    for c in count(9,2):                 # the Candidate
        if c in sieve:               # c's a multiple of some base prime
            s = sieve.pop(c)         #     i.e. a composite ; or
        elif c < q:  
             yield c                 # a prime
             continue              
        else:   # (c==q):            # or the next base prime's square:
            s=count(q+2*p,2*p)       #    (9+6, by 6 : 15,21,27,33,...)
            p=next(ps)               #    (5)
            q=p*p                    #    (25)
        for m in s:                  # the next multiple 
            if m not in sieve:       # no duplicates
                break
        sieve[m] = s                 # original test entry: ideone.com/WFv4f

（此处较旧的原始代码已被编辑以合并更改，如下面的Tim Peters 的答案所示）。另请参阅此以获取相关讨论。

类似的基于2-3-5-7 轮子的代码运行速度快了 ~ 2.15 倍（这非常接近的理论改进3/2 * 5/4 * 7/6 = 2.1875）。

score 47 · Accepted Answer

对于后人来说，这是Will Ness为 Python 3 编写的漂亮算法的重写。需要进行一些更改（迭代器不再具有.next()方法，但有一个新的next()内置函数）。其他更改是为了好玩（使用 newyield from <iterable>替换原始语句中的四个yield语句。更多是为了可读性（我不喜欢过度使用 ;-) 1 字母变量名称）。

它比原来的要快得多，但不是出于算法原因。加速主要是由于删除了原始add()函数，而是内联。

def psieve():
    import itertools
    yield from (2, 3, 5, 7)
    D = {}
    ps = psieve()
    next(ps)
    p = next(ps)
    assert p == 3
    psq = p*p
    for i in itertools.count(9, 2):
        if i in D:      # composite
            step = D.pop(i)
        elif i < psq:   # prime
            yield i
            continue
        else:           # composite, = p*p
            assert i == psq
            step = 2*p
            p = next(ps)
            psq = p*p
        i += step
        while i in D:
            i += step
        D[i] = step

score 8 · Accepted Answer

这最初不是我的代码，但是，值得发布。原文可以在这里找到：http ://code.activestate.com/recipes/117119/

def gen_primes():
  D = {}
  q = 2  # first integer to test for primality.

  while True:
    if q not in D:
      # not marked composite, must be prime  
      yield q 

      #first multiple of q not already marked
      D[q * q] = [q] 
    else:
      for p in D[q]:
        D.setdefault(p + q, []).append(p)
      # no longer need D[q], free memory
      del D[q]

    q += 1

它是一个生成器，所以像其他任何东西一样使用它。

primes = gen_primes()
for p in primes:
  print p

在我的桌面上生成并放入一组 100 万个素数需要 1.62 秒。

score 5 · Accepted Answer

Do a segmented sieve, where the size of a segment is determined by available memory or the maximal size of a bitset.

For each segment represent the numbers in some interval [n; n + segment_size) as a bit set and sieve with all prime numbers below the square root of the upper bound.

Using a bit set uses less memory than a hash table or tree data structure, because you are working with dense sets of numbers.

score 2 · Accepted Answer

另一个答案，比我erat3在这里的答案更节省内存：

import heapq

def heapprimegen():
    hp= []
    yield 2
    yield 3
    cn= 3
    nn, inc= 3, 6
    while 1:
        while cn < nn:
            yield cn
            heapq.heappush(hp, (3*cn, 2*cn))
            cn+= 2
        cn= nn+2
        nn, inc= heapq.heappushpop(hp, (nn+inc, inc))

它维护一个质数倍数的堆（列表）而不是字典。显然，它失去了一些速度。

score 2 · Accepted Answer

另一种方法：

import itertools
def primeseq():
    prime = [2]
    num = 0
    yield 2
    for i in itertools.count(3, 2):
        is_prime = True
        for num in prime:
            if i % num == 0:
                is_prime = False
                break
            elif num ** 2 > i: 
                break
        if is_prime:
            prime.append(i)
            yield i

score 2 · Accepted Answer

这是一个非常快的无限生成器，用 Python2 编写，但很容易适应 Python3。要使用它来将素数相加到 10**9，请使用以下命令：

from itertools import takewhile
from functools import partial
from operator import gt
print (sum(takewhile(partial(gt, 10**9), prime_gen_inf())))

它是一种分段筛，速度更快，但显然不如 Will Ness 的算法那么优雅。

from operator import mul
from functools import reduce
def prod(x): return reduce(mul, x, 1)


def build_sieve(wheel):
    w = prod(wheel)
    w_phi = prod([p-1 for p in wheel])
    rems = [a for a in range(w) if all(a % p for p in wheel)]
    assert len(rems) == w_phi
    inv = {a:pow(a, w_phi - 1, w) for a in rems}
    try:
        known_p = wheel + rems[1 : rems.index(rems[1]*rems[1])]
    except ValueError:
        known_p = wheel + rems[1:]
    return wheel, w, w_phi, rems, inv, known_p

#Adjust the chunk variable based on your computer's architecture.
#
#Adjust the line with #! if you don't need "true" infinite.  If you don't need
#primes larger than 1<<32, use array('H', []), if 1<<64 use 'L', if 1<<128 (in
#Python3) use 'Q', otherwise use empty list [].
#To save memory, comment out the lines with #*, and uncomment the commented-out
#lines 
import itertools
from itertools import islice, count, compress, izip
chain_f = itertools.chain.from_iterable
from array import array
def prime_gen_inf(chunk=250000, sieve_info = build_sieve([2,3,5,7])):
    """    Indefinitely yields primes    """
    wheel, w, w_phi, rems, inv, known_p = sieve_info
    for p in known_p: yield p
    new_n = 0;
    while True:
        size = min(chunk, (p * p - new_n) / w)
        sieve = bytearray([1]) * size * w_phi
        n, new_n = new_n, new_n + size * w
        if not n:
            zero = bytearray([0])
            seen = len(known_p) - len(wheel) + 1
            sieve[:seen:1] = zero * seen
            p_gen = islice(prime_gen_inf(), len(wheel), None)
            new_p = next(p_gen)
            ps = []                                         #! array('H', [])
            p_invs = bytearray([])                                         #*
        while new_p * new_p < new_n:
            ps.append(new_p)
            p_invs.append(inv[new_p % w])                                  #*
            new_p = next(p_gen)
        for p, p_inv, modp in izip(ps, p_invs, [-n % p for p in ps]):      #*
            s = [(modp + p * (p_inv * (r - modp) % w)) / w for r in rems]  #*
        #for p in ps:
        #    s = [(-n%p + p * (inv[p%w] * (r - -n%p) % w)) / w for r in rems]
            for i, start in enumerate(s):
                slice_size = ((size - start - 1) / p + 1)
                sieve[i + start * w_phi :: p * w_phi] = zero * slice_size
        for p in compress(chain_f(izip(*[count(n+r, w) for r in rems])), sieve):
            yield p

score 1 · Accepted Answer

这是一个复杂的基于堆的实现，它并不比其他基于堆的实现快多少（参见我的另一个答案中的速度比较），但它使用的内存要少得多。

此实现使用两个堆（tu 和 wv），它们包含相同的数字元素。每个元素都是一个 int 对。为了找到直到q**2（其中q是素数）的所有素数，每个堆将最多包含2*pi(q-1)元素，其中pi(x)正素数的数量不大于x。所以整数的总数最多为4*pi(floor(sqrt(n)))。（我们可以通过将一半的东西推送到堆中来获得 2 的内存因子，但这会使算法变慢。）

上面的其他基于 dict 和堆的方法（例如，erat2b 和 heap_prime_gen_squares 和 heapprimegen）存储大约 '2*pi(n)' 整数，因为它们每次找到素数时都会扩展它们的堆或 dict。作为比较：要找到 1_000_000 个素数，此实现存储少于 4141 个整数，其他实现存储多于 1_000_000 个整数。

import heapq

def heap_prime_gen_smallmem():
    yield 2
    yield 3
    f = 5
    fmar3 = 2
    q = 7
    q6 = 7 * 6
    qmar3 = 4
    tu = [(25, 30), (35, 30)]
    vw = [(25, 30), (35, 30)]
    while True:
        qmar3 += 2   
        if qmar3 == 6:  
            qb = q + 4
            q6b = q6 + 24
            qmar3 = 2
        else:
            qb = q + 2
            q6b = q6 + 12
        if q < tu[0][0]:
            d = q * q
            while f < d:
                a, b = vw[0]
                if f < a: 
                    yield f   
                else:
                    a, b = vw[0]
                    heapq.heapreplace(vw, (a + b, b))
                    a, b = vw[0]
                    while f >= a:
                        heapq.heapreplace(vw, (a + b, b))
                        a, b = vw[0]   
                fmar3 += 2
                if fmar3 == 6:
                    f += 4
                    fmar3 = 2
                else:
                    f += 2
            c = q * qb   
            heapq.heappush(tu, (d, q6))
            heapq.heappush(tu, (c, q6))
            heapq.heappush(vw, (d, q6))
            heapq.heappush(vw, (c, q6))
        else:
            a, b = tu[0]
            heapq.heapreplace(tu, (a + b, b))
            a, b = tu[0]  
            while q >= a:
                heapq.heapreplace(tu, (a + b, b))
                a, b = tu[0]
        q = qb
        q6 = q6b

score 1 · Accepted Answer

这是一个使用堆而不是字典的简单但不是非常慢的方法：

import heapq

def heap_prime_gen_squares(): 
    yield 2  
    yield 3  
    h = [(9, 6)]
    n = 5
    while True:
        a, b = h[0]
        while n < a:
            yield n
            heapq.heappush(h, (n * n, n << 1))
            n += 2
        heapq.heapreplace(h, (a + b, b))  # Replace h[0], which is still (a, b).

我对前 100 万个素数的用户时间的速度测量（数字越小越好）：

推迟筛子（基于字典）：8.553s
erat2b（基于字典）：9.513s
erat2a（基于字典）：10.313s
heap_prime_gen_smallmem（基于堆）：23.935s
heap_prime_gen_squares（基于堆）：27.302s
heapprimegen（基于字典）：145.029s

所以基于字典的方法似乎是最快的。

score 0 · Accepted Answer

这是一个更真实的生成器，它在 Haskell 中是如何完成的：过滤已知素数的组合，然后将剩余的素数添加到列表中。

def gen_primes():
    primes = []
    i = 2
    while True:
        prime = True
        for p in primes:
            if not (i % p):
                prime = False
                break
        if prime:
            yield i
            primes.append(i)
        i += 1

score 0 · Accepted Answer

我知道帖子很旧，但我遇到了这个问题......以下代码基于一个非常简单的想法：Eratosthenes 的增长筛。虽然这个解决方案比这里最好的解决方案慢，但它很容易掌握并且设计成可读...

我使用整数来存储筛子的结果。在二进制格式中，整数是0s 和1s的列表0，i如果它i不是素数，1如果它可能是素数。必要的无穷大是 Python 3 整数无界的结果。

def primes():
    container, size = 1 << 2, 3 # we start with 0b100 (from right to left: 0 and 1 are not primes, 2 is
    last_prime = 1
    while True:
        prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None) # find the next prime
        while not prime:
            container, size = expand(container, size, 2**16) # add 65536 cells and sieve the container
            prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None)
        yield prime
    last_prime = prime

如何扩展容器？1只需在容器左侧（二进制格式）添加一堆s 并筛选它们。这与标准筛相同，略有不同。在标准筛子中，如果我们找到一个素数i，我们开始在处穿过细胞i*i，步长为i。

在这里，这可能是针对容器的第一部分完成的。如果容器的新部分比i*i.

def expand(container, size, n):
    new_size = size + n
    container += (1 << (new_size + 1) - 1) - (1 << size) # add n 1's
    for i in range(2, new_size):
        if container & (1 << i): # i is a prime
            t = sum(1 << j for j in range(max(i, size // i)*i, new_size, i)) # set 1 for all mutiple
            container &= ~t # cross the cells

    return container, new_size

测试一百万个素数：

import itertools
assert 78498 == len(list(itertools.takewhile(lambda p: p<1000000, primes())))

python - 如何在 Python 中实现一个高效的无限质数生成器？

12 回答 12

“If I have seen further…”</h1> The erat2 function from the cookbook can be further sped up (by about 20-25%):