0

例如...

  1. 我有一个整数数组,它由 1 到 1000 之间的随机值初始化
  2. 数组有 1M 个元素(它可能会有更多,但这只是示例)
  3. 每个元素的出现次数必须在 10 到 1010 之间

调整此数组元素以使其符合上述标准的最快方法是什么?

如果最大出现次数接近 array.Length (1M)/valuesSpan (1000),我的第一个解决方案就太慢了

我尝试了类似的东西(这仅用于对齐出现的最大值,下限的解决方案几乎相同):

Int64[] DistinctArrayElements = distinctArrayElements;
Dictionary<Int64, Int32> occurrences = new Dictionary<Int64, Int32>();

foreach (Int64 DistinctElement in DistinctArrayElements)
{
    occurrences.Add(DistinctElement, 0);
}

foreach (Int64 ArrayElement in Arr)
{
    occurrences[ArrayElement] += 1;
}
//I know this initialization can be done more nicely, so don't bother with this.

for (int j = 0; j < Arr.Length; j++)
{
    if (occurrences[Arr[j]] > upperNoOfOccurrences)
    {
        for (int i = 0; i < Arr.Length; i++)        
        {
            if (occurrences[Arr[i]] < upperNoOfOccurrences)
            {
                Arr[j] = Arr[i];
                occurrences[Arr[i]] += 1;
                occurrences[Arr[j]] -= 1;
            }
        }
    }
}
4

3 回答 3

0

我会对您的字典进行排序,以便出现较少的数字排在第一位。这样您就不必每次都寻找合适的数字,只需将其替换为出现次数较少的数字即可。这是一个伪代码来解释这一点:

struct dict {
    key, value
}

linkedList<dict> occurrences;

initialize occurrences
sort it (smallest values first)

// start from the one with greatest number of occurrences
n = occurrences.last;

// keep going until occurrences of n is greater than upperNoOfOccurrences
while n.value.value > upperNoOfOccurrences and didn't reach first element
    repeat = true

    do:
        // required occurrences to subtract to be within the limit
        required = upperNoOfOccurrences - n.value.value

        // maximum occurrences we can add to the first
        maxAllowed = upperNoOfOccurrences - occurrences.first.value.value

        // if we can do that
        if required < maxAllowed:
            occurrences.first.value.value += required
            n.value.value -= required
            repeat = false
        else:    // n.value.value is still greater than upperNoOfOccurrences
            occurrences.first.value.value += maxAllowed 
            n.value.value -= maxAllowed 
            repeat = true
        end if

        // keep occurrences sorted
        newPos = occurrences.first.next
        while occurrences.first.value > newPos.value.value:
            newPos = newPos.next

        move occurrences.first before newPos
    while repeat
end while

now rebuild your array with occurrences. it will
be sorted  but it doesn't matter does it? ;)
于 2011-10-25T16:07:32.083 回答
0

这是一种简单而精确的方法,可以对符合您标准的数字集进行统一采样。

  1. 令 M = 不同值的数量;N = 数组元素的数量;L = 每个值的实例计数的下限;U = 计数上限;D = UL。对于您的示例,M=1000、N=1000000、L=10、U=1010 和 D=1000。
  2. 创建大小为 M*D 的数组 S。将 S 的前 N ​​个条目设置为 1,其余的设置为零。
  3. 通过 Fisher-Yates 洗牌 S 洗牌。(请参阅此处的链接)
  4. 创建大小为 M 的数组 T。
  5. 对于i最多 M,设置 T[i] = L + S[i D] + S[i D+1] + ... + S[i*D+D-1]。
  6. 创建数组 V,大小为 N。用第 0 个值的 T[0] 个实例填充它,依此类推,i每个 'th 个值的 T[i] 个实例i。因为 S 包含 N 个 1,所以 V 将被完全且准确地填充。
  7. 通过 Fisher-Yates 洗牌洗牌 V。然后数组 V 满足原始条件。

请注意,步骤 2-5 是 O(M D),而 6-7 是 O(N+M),后者尽可能好,前者可能同样,因为 M D 是 O(N)你的问题陈述。

于 2011-10-25T21:29:30.643 回答
0

我无法从你想做的事情中得到真正的意义。但是,将数组处理这么多次似乎是一种浪费。你都可以只用一个循环(当你用完“免费”唯一值时稍微向前看)。下面的代码当然不是我写的最好的代码,但我认为它可以解决你的问题。

HashSet<long> forbidden = new HashSet<long>(); // maximum size of 1000, contains values that exceeded the limit
Queue<long> remaining = new Queue<long>(1000); // stores found unique values within the limit in a queue, that will be used if we bounce into the limit
Dictionary<long, int> frequencies = new Dictionary<long, int>(1000);
int lastPeekIndex = 0;
for (int i = 0; i < Arr.Length; i++) {
  if (!frequencies.ContainsKey(Arr[i])) {
    frequencies[Arr[i]] = 0;
    remaining.Add(Arr[i]);
  }

  if (frequencies[Arr[i]] == upperLimit) {
    if (!forbidden.Contains(Arr[i])) forbidden.Add(Arr[i]);
    var next = Int64.MinValue;
    try {
      next = remaining.Dequeue();
      while (forbidden.Contains(next)) next = remaining.Dequeue();
    } catch (InvalidOperationException) { // Arrr! we have not yet observed enough unique values
      for (int j = Math.Max(i, lastPeekIndex) + 1; j < Arr.Length; j++)
        if (!frequencies.ContainsKey(Arr[j])) {
          frequencies[Arr[j]] = 0;
          next = Arr[j];
          lastPeekIndex = j;
        }
    }
    Arr[i] = next;
    frequencies[next]++;
    if (frequencies[next] < upperLimit) remaining.Enqueue(next);
  } else frequencies[Arr[i]]++;
}

请注意,这不会检查下限,因为您也没有检查这个。我认为您必须关心在第二遍中出现的频率不够高的值。您可以在第一遍之后将它们放入另一个队列中,然后一次又一次地遍历数组,直到队列为空(在第二遍中可能甚至不需要一次完整的迭代)。

于 2011-10-25T15:26:54.647 回答