10

如何管理一个包含 100+ 百万个字符串的庞大列表?我怎样才能开始处理如此庞大的列表?

示例大列表:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"
4

5 回答 5

11

有很多很多的内存。Python 列表和字符串实际上是相当有效的,所以只要你有内存,它应该不是问题。

也就是说,如果您要存储的是专门的扑克牌,那么您绝对可以提出更紧凑的表示形式。例如,您可以使用一个字节来编码每张牌,这意味着您只需要一个 64 位 int 来存储一手牌。然后,您可以将它们存储在 NumPy 数组中,这将比 Python 列表更有效。

例如:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

为了加快最后一行的速度:hands[num] = map(cards_to_bytes.__getitem__, hand)

这将只需要 7 * 133784560 = ~1gb 的内存......如果你将四张卡打包到每个字节中,这可能会减少(我不知道这样做的语法......)

于 2013-03-07T23:30:29.030 回答
9

如果您只是想遍历所有可能的手来计算它们或找到具有特定属性的手,则无需将它们全部存储在内存中

您可以只使用迭代器而不转换为列表:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

德州扑克中的 85900584 手牌组合。

于 2013-03-07T23:45:45.510 回答
3

另一个允许您创建数据流以进行处理的无内存选项是使用生成器。例如。

打印总手数:

sum (1 for x in combinations(cards, 7))

打印包含梅花 A 的手数:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)
于 2013-03-14T21:54:43.480 回答
1

通常在您花费多长时间编码和运行代码需要多长时间之间进行权衡。如果您只是想快速完成某件事并且不希望它经常运行,那么您建议的方法就可以了。只是让列表变得很大——如果你没有足够的 RAM,你的系统会搅动虚拟内存,但你可能会比学习如何编写更复杂的解决方案更快地得到你的答案。

但是,如果这是一个您希望定期使用的系统,那么您应该想办法把所有东西都存储在 RAM 中。SQL 数据库可能是您想要的。它们可能非常复杂,但因为它们几乎无处不在,所以有很多优秀的教程。

您可能会寻找一个文档完善的框架,例如 django,它通过 ORM 层简化了对数据库的访问。

于 2013-03-07T23:37:48.473 回答
0

我的公共领域OneJoker库有一些方便的组合函数。它有一个 Iterator 类,可以为您提供有关组合集的信息,而无需存储它们甚至运行它们。例如:

  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               

您可以使用 Iterator.rank() 函数将每只手减少为单个 int,将它们存储在紧凑数组中,然后使用 Iterator.hand_at() 按需生成它们。

于 2013-03-26T06:03:48.217 回答