4

如何为“Celestial Jukebox”实施洗牌?

更准确地说,在每个时间 t,返回一个介于 0..n(t) 之间的均匀随机数,这样整个序列中没有重复,n() 随着时间的推移而增加。

对于具体的例子,假设一个固定费率的音乐服务允许通过从 0 开始的索引号播放目录中的任何歌曲。每隔一段时间,就会添加新歌曲,从而增加索引号的范围。目标是每次播放一首新歌(假设目录中没有重复)。

一个理想的解决方案在现有硬件上是可行的——我如何在 8MB 的 DRAM 中硬塞一个包含 600 万首歌曲的列表?同样,高歌曲数会加剧 O(n) 选择时间。

-- 对于 LCG 发生器,给定 0..N 0上部分耗尽的 LCG ,是否可以将其转换为 0..N 1上的不同 LCG (其中 N1 > N0),不会重复耗尽的序列。
- 检查一首特定的歌曲是否已经播放过似乎很快就会失控,尽管这可能是唯一的方法?是否有有效的数据结构?

4

5 回答 5

3

我喜欢做这种非重复随机选择的方式是有一个列表,每次我在 之间随机选择一个项目时[0-N),我都会从该列表中删除它。在您的情况下,随着新项目被添加到目录中,它也会被添加到尚未选择的列表中。完成后,只需将所有歌曲重新加载回列表即可。

编辑:

如果您考虑 v3 的建议,这基本上可以在初始化步骤O(1)之后的时间内完成。O(N)它保证不重复的随机选择。

这是回顾:

  1. 将初始项目添加到列表中
  2. 随机选择索引i(从 集合中[0,N)
  3. 删除索引处的项目i
  4. 将洞替换为i 项目Nth(或 null if i == Nth)并递减N
  5. 对于新项目,只需附加到列表的末尾并N根据需要增加
  6. 如果您曾经播放过所有歌曲(我怀疑您是否有 6M 首歌曲),然后将所有歌曲添加回列表中,起泡、冲​​洗并重复。

Since you are trying to deal with rather large sets, I would recommend the use of a DB. A simple table with basically two fields: id and "pointer" (where "pointer" is what tells you the song to play which could be a GUID, FileName, etc, depending on how you want to do it). Have an index on id and you should get very decent performance with persistence between application runs.

EDIT for 8MB limit:

Umm, this does make it a bit harder... In 8 MB, you can store a maximum of ~2M entries using 32-bit keys.

So what I would recommend is to pre-select the next 2M entries. If the user plays through 2M songs in a lifetime, damn! To pre-select them, do a pre-init step using the above algorithm. The one change I would make is that as you add new songs, roll the dice and see if you want to randomly add that song to the mix. If yes, then pick a random index and replace it with the new song's index.

于 2009-05-13T20:05:20.583 回答
1

With a limit of 8MB for 6 million songs, there's plainly not room to store even a single 32 bit integer for each song. Unless you're prepared to store the list on disk (in which case, see below).

If you're prepared to drop the requirement that new items be immediately added to the shuffle, you can generate an LCG over the current set of songs, then when that is exhausted, generate a new LCG over only the songs that were added since you began. Rinse and repeat until you no longer have any new songs. You can also use this rather cool algorithm that generates an unguessable permutation over an arbitrary range without storing it.

If you're prepared to relax the requirement of 8MB ram for 6 million songs, or to go to disk (for example, by memory mapping), you could generate the sequence from 1..n at the beginning, shuffle it with fisher-yates, and whenever a new song is added, pick a random element from the so-far-unplayed section, insert the new ID there, and append the original ID to the end of the list.

If you don't care much about computational efficiency, you could store a bitmap of all songs, and repeatedly pick IDs uniformly at random until you find one you haven't played yet. This would take 6 million tries to find the last song (on average), which is still damn fast on a modern CPU.

于 2009-05-13T23:08:52.470 回答
0

While Erich's solution is probably better for your specific use case, checking if a song has already been played is very fast (amortized O(1)) with a hash-based structure, such as a set in Python or a hashset<int> in C++.

于 2009-05-13T20:12:30.930 回答
0

You could simply generate the sequence of numbers from 1 to n and then shuffle it using a Fisher-Yates shuffle. That way you can guarantee that the sequence won't repeat, regardless of n.

于 2009-05-13T20:23:24.190 回答
0

You could use a linked list inside an array: To build the initial playlist, use an array containing a something like this:

 struct playlistNode{
  songLocator* song;
  playlistNode  *next;
};
struct playlistNode arr[N];

Also keep a 'head' and 'freelist' pointer;

Populate it in 2 passes:
1. fill in arr with all the songs in the catalog in order 0..N.
2. randomly iterate through all the indexes, filling in the next pointer;

Deletion of songs played is O(1):

head=cur->next;
cur->song=NULL;
freelist->next = freelist;
cur->next=freelist;
freelist=cur;

Insertion of new songs is O(1) also: pick an array index at random, and patch a new node.

node = freelist;
freelist=freelist->next;
do {
i=rand(N);   
} while (!arr[i].song);  //make sure you didn't hit a played node
node->next = arr[i].next;
arr[i].next=node;
于 2009-05-13T21:10:55.797 回答