python - 如何创建一个脚本，为我提供六位数代码的所有可能组合

Question

我和一个朋友想要创建一个脚本，为我们提供六位代码的所有可能排列，由 36 个字母数字字符（0-9 和 az）组成，按字母顺序排列，然后能够在 .txt 文件中看到它们.

而且我希望它能够使用它所能使用的所有 CPU 和 RAM，从而减少完成任务所需的时间。

到目前为止，这是代码：

import random
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
file = open("codes.txt", "a")

for g in range(0, 36**6):
    key = ""
    base = ""
    print(str(g))
    for i in range(0, 6):
        char = random.choice(charset)
        key += char
    base += key
    file.write(base + "\n")

file.close()

此代码随机生成组合并立即将它们写入 .txt 文件中，同时打印它已经创建的代码数量，但是它不是按字母顺序（必须在之后进行），并且花费的时间太长。

如何改进代码以提供预期的结果？

感谢@R0Best 提供最佳答案

score 2 · Accepted Answer

虽然这篇文章已经有 6 个答案，但我对其中的任何一个都不满意，所以我决定贡献自己的解决方案。

首先，请注意，许多答案都提供了字母的combinations或permutations，但该帖子实际上想要字母表的笛卡尔积本身（重复 N 次，其中 N = 6）。（此时）有两个答案可以做到这一点，但是它们write的次数过多，导致性能低于标准，并且还将它们的中间结果连接到循环的最热部分（也降低了性能）。

为了将优化到绝对最大值，我提供以下代码：

from string import digits, ascii_lowercase
from itertools import chain

ALPHABET = (digits + ascii_lowercase).encode("ascii")

def fast_brute_force():
    # Define some constants to make the following sections more readable
    base_size = 6
    suffix_size = 4
    prefix_size = base_size - suffix_size
    word_size = base_size + 1
    
    # define two containers
    #   word_blob - placeholder words, with hyphens in the unpopulated characters (followed by newline)
    #   sleds - a tuple of repeated bytes, used for substituting a bunch of characters in a batch
    word_blob = bytearray(b"-" * base_size + b"\n")
    sleds = tuple(bytes([char]) for char in ALPHABET)

    # iteratively extend word_blob and sleds, and filling in unpopulated characters using the sleds
    # in doing so, we construct a single "blob" that contains concatenated suffixes of the desired
    # output with placeholders so we can quickly substitute in the prefix, write, repeat, in batches
    for offset in range(prefix_size, base_size)[::-1]:
        word_blob *= len(ALPHABET)
        word_blob[offset::word_size] = chain.from_iterable(sleds)
        sleds = tuple(sled * len(ALPHABET) for sled in sleds)
    
    with open("output.txt", "wb") as f:
        # I've expanded out the logic for substituting in the prefixes into explicit nested for loops
        # to avoid both redundancy (reassigning the same value) and avoiding overhead associated with
        # a recursive implementation
        # I assert this below, so any changes in suffix_size will fail loudly
        assert prefix_size == 2
        for sled1 in sleds:
            word_blob[0::word_size] = sled1
            for sled2 in sleds:
                word_blob[1::word_size] = sled2
                # we write to the raw FileIO since we know we don't need buffering or other fancy
                # bells and whistles, however in practice it doesn't seem that much faster
                f.raw.write(word_blob)

该代码块中发生了很多魔术，但简而言之：

我批量写入，以便我一次写入36**4或1679616条目，因此上下文切换更少。
我1679616使用 bytearray 切片/分配同时使用新前缀更新每批的所有条目。
我对字节进行操作，写入原始 FileIO，扩展前缀分配的循环，以及其他小的优化以避免编码/缓冲/函数调用开销/其他性能损失。

请注意，除非您有一个非常快的磁盘和较慢的 CPU，否则您不会从较小的优化中看到太多好处，可能只是写入批处理。

在我的系统上，产品 + 写入文件大约需要 45 秒14880348，这是写入我最慢的磁盘。在我的 NVMe 驱动器上，它需要6.868几秒钟。

score 0 · Accepted Answer

第一件事；有更好的方法可以做到这一点，但我想写一些清晰易懂的东西。

伪代码：

base = "";
for(x1=0; x1<charset.length(); x1++)
    for(x2=0; x2<charset.length(); x2++)
        for(x3=0; x3<charset.length(); x3++)
            .
            .
            .
        { base = charset[x1]+charset[x2]+charset[x3]+.....+charset[x6];
          file.write(base + "\n")
        }

score 0 · Accepted Answer

对于排列，这可以解决问题：

from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
    for permutation in permutations(charset, 6):
        f.write(''.join(permutation) + '\n')

仅供参考，它将创建一个 7.8 GigaByte 文件

对于组合，这可以解决问题：

from itertools import combinations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
    for comb in combinations(charset, 6):
        f.write(''.join(comb)+ '\n')

仅供参考，它将创建一个 10.8 兆字节的文件

score 0 · Accepted Answer

随机可能非常低效。你可以试试：

from itertools import permutations
from pandas import Series
charset = list("0123456789abcdefghijklmnopqrstuvwxyz")
links = []
file = open("codes.txt", "a")
comb = permutations(charset,6)
comb = list(comb)
comb = list(map(lambda x:return ''.join(x),comb))
mySeries = Series(comb)
mySeries = mySeries.sort_values()

base = ""
for k in mySeries:
    base += k
file.write(base + "\n")

file.close()

score 0 · Accepted Answer

您可以itertools.permutaions从默认itertools库中使用。您还可以指定组合中的字符数。

from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"

c = permutations(charset, 6)

with open('code.txt', 'w') as f:
    for i in c:
        f.write("".join(i) + '\n')

在我的计算机上运行大约 200 毫秒以创建排列列表，然后花费大量时间写入文件

score 0 · Accepted Answer

这是一个组合问题，您试图从长度为 36 的字符集中获取长度为 6 的组合。这将产生大小为 36!/(30!*6!) 的输出。您可以参考 itertools 来解决像您这样的组合问题。您可以参考 itertools 文档中的组合功能。建议不要使用 Python 执行这种性能密集型计算。

score 0 · Accepted Answer

我能想到的最快方法是使用pypy3和这段代码：

import functools
import time
from string import digits, ascii_lowercase


@functools.lru_cache(maxsize=128)
def main():
    cl = []
    cs = digits + ascii_lowercase
    for letter in cs:
        cl.append(letter)
    ct = tuple(cl)
    with open("codes.txt", "w") as file:
        for p1 in ct:
            for p2 in ct:
                for p3 in ct:
                    for p4 in ct:
                        for p5 in ct:
                            for p6 in ct:
                                file.write(f"{p1}{p2}{p3}{p4}{p5}{p6}\n")


if __name__ == '__main__':
    start = time.time()
    main()
    print(f"Done!\nTook {time.time() - start} seconds!")

它以大约 10-15MB/s 的速度写入。我相信总文件大约为 15GB，因此生成大约需要 990-1500 秒。结果是在具有 1 个 3.4 ghz 核心服务器 CPU 和旧 SATA3 SSD 的 unraid 虚拟机上。使用 NVME 驱动器和更快的单核 CPU，您可能会获得更好的结果。

python - 如何创建一个脚本，为我提供六位数代码的所有可能组合

7 回答 7

Related

Reference