c# - 如何在二维整数数组中找到最常见的整数？

Question

好的，所以我刚刚开始考虑如何为 Paint.NET 实现一个新的图形插件，我需要知道如何在二维整数数组中找到最常见的整数。是否有内置的 C# 方法来执行此操作？或者，有没有人有一个巧妙的方法来做到这一点？

该数组将如下所示：

300 300 300 300 300 300 300
  0 150 300 300 300 300 300
  0   0 150 300 300 300 300
  0   0   0   0 300 300 300
  0   0   0   0 150 300 300
  0   0   0   0   0 150 300
  0   0   0   0   0   0 300

我需要知道 300 是数组中最常见的数字。如果没有“最常见”，则只需返回中心数（数组尺寸总是奇数 x 奇数）0。

我将使用“蛮力”算法来实现这一点，除非您的专家可以更快地想出一些东西。

任何帮助将不胜感激。

谢谢！

编辑：更多信息...

这些值几乎总是非常多样化（比我的示例数组更多样化）。这些值将在 0-360 的范围内。根据算法的速度，数组的大小将是 5x5 到大约 17x17。结果将为大图像中的每个像素计算一次……所以越快越好。;)

score 6 · Accepted Answer

无论如何，它至少是 O(n*m) ——你必须至少查看每个单元格一次。节省的地方是在寻找最常见的值之前累积每个值的计数；如果您的整数在相对较小的范围内变化（比如说，它们是 uint16），那么您也许可以简单地使用平面数组而不是地图。

我想你也可以保留当前最接近和第二接近的“最常见”候选的运行计数x，y并且只要你剩下的单元格少于 (n*m)-(xy) 就可以提前退出看，因为在那一点上，亚军不可能超过头号候选人。

像这样的整数运算非常快；即使是百万像素图像，蛮力算法也只需要几毫秒。

我注意到您已经编辑了原始问题，说像素值从 0..255 开始——在这种情况下，肯定会使用简单的平面数组；它足够小，可以轻松放入 l1 dcache，并且在平面数组中查找速度很快。

[编辑] ：一旦你建立了直方图数组，处理“没有最常见的数字”的情况就非常简单了：你要做的就是通过它找到“最常见的”和“第二常见的”常见数字；如果它们同样频繁，那么根据定义，没有最常见的。

const int numLevels = 360; // you said each cell contains a number [0..360)
int levelFrequencyCounts[numLevels]; // assume this has been populated such that levelFrequencyCounts[i] = number of cells containing "i"
int mostCommon = 0, runnerUp = 0;
for (int i = 1 ; i < numLevels ; ++i)
{
  if ( levelFrequencyCounts[i] > levelFrequencyCounts[mostCommon] )
  {
    runnnerUp = mostCommon;
    mostCommon = i;
  }
}

if ( levelFrequencyCounts[mostCommon] != levelFrequencyCounts[runnerUp] )
{
   return mostCommon;
}
else
{
   return CenterOfInputData; // (something like InputData[n/2][m/2])
}

score 3 · Accepted Answer

我将如何在 C# 中做这样的事情？

像这样的东西：

Dictionary<int, int> d = new Dictionary<int, int>();
foreach (int value in matrix)
{
 if (!d.ContainsKey(value))
  d.Add(value, 1);
 else
  d[value] = d[value] + 1;
}
KeyValuePair<int, int> biggest = null;
foreach (KeyValuePair<int, int> found in d)
{
  if ((biggest == null) || (biggest.Value < found.Value))
    biggest = found;
}

score 1 · Accepted Answer

一种选择是 LINQ - 效率有点低，但对于非大型数组来说还可以：

    var max = (from cell in data.Cast<int>()
               group cell by cell into grp
               select new { Key = grp.Key, Count = grp.Count() } into agg
               orderby agg.Count descending
               select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

或者使用锯齿状数组：

    var max = (from row in data
              from cell in row
              group cell by cell into grp
              select new {Key = grp.Key, Count = grp.Count()} into agg
              orderby agg.Count descending
              select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

实际上，我可能会使用字典/计数。这个例子没有 LINQ，只是“因为”：

    Dictionary<int, int> counts = new Dictionary<int, int>();
    foreach (int value in data)
    {
        int count;
        counts.TryGetValue(value, out count);
        counts[value] = count + 1;
    }
    int maxCount = -1, maxValue = 0;
    foreach (KeyValuePair<int, int> pair in counts)
    {
        if (pair.Value > maxCount)
        {
            maxCount = pair.Value;
            maxValue = pair.Key;
        }
    }
    Console.WriteLine(maxCount + ": " + maxValue);

score 1 · Accepted Answer

如果速度是您最关心的问题，请不要使用字典。坚持使用字节数组。试试这个：

// stores hit counts (0-360)
short[] hitCounts = new short[361];

// iterate through 2d array and increment hit counts
for (int i = 0; i < toEvaluate.Length; i++)
{
    for (int j = 0; j < toEvaluate[i].Length; j++)
        hitCounts[toEvaluate[i][j]]++;
}

int greatestHitCount = 0; // the hit count of the current greatest value
int greatest = -1; // the current greatest valeu

// iterate through values (0-360) and evalute hit counts
for (int i = 0; i < hitCounts.Length; i++)
{
    // the hit count of hitCounts[i] is higher than the current greatest hit count value
    if (hitCounts[i] > greatestHitCount)
    {
        greatestHitCount = vals[i]; // store the new hit count
        greatest = i; // store the greatest value
    }
    // there is already a value with the same hit count (which is the greatest)
    else if (hitCounts[i] == greatestHitCount)
        greatest = -1; // there are more than one value, we can't use this if it ends up being the greatest
}

if (greatest >= 0) // no greatest value found
    return greatest;

// figure out the middle x and y value
int x = (toEvaluate.Length - 1) / 2 + 1;
int y = (toEvaluate[x].Length - 1) / 2 + 1;

// return the value at the center of the 2d array as the value
return toEvaluate[x][y];

当速度成为对可读性的关注时，您最终会得到一定丑陋的代码。上述内容肯定可以从重构中受益（因此过度评论），但它应该运行得很快。如果速度不够快，您可以通过将其移至非托管代码来获得更多优化。

score 1 · Accepted Answer

你的形象：

300+ 300+ 300+ 300 300 300 300
  0+ 150+ 300+ 300 300 300 300
  0+   0+ 150+ 300 300 300 300
  0    0    0    0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

标记 (+) 数字是您的窗口。w,h 是您的窗口尺寸。应用桶排序（正如其他人建议的那样，因为您的值范围非常有限）。不要像Crashworks建议的那样中途削减您的评估。不要抛出你的结果。这是第一步。

300- 300- 300- 300 300 300 300
  0. 150. 300. 300 300 300 300
  0.   0. 150. 300 300 300 300
  0+   0+   0+   0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

移动你的窗口。而不是添加，减去您传递的最后一行/列中的存储桶并添加新的存储桶。这样，您检查每个像素 2(w+h) 次，即当它穿过窗口边界时，而不是 w*h 次，即当该像素在窗口中时，在一个幼稚的实现中。

换句话说，您需要像这样移动窗口：

|  ^->|  ^
|  |  |  |
|  |  |  |
V->|  V->|

我假设您正在尝试实现非线性卷积滤波器。

欢迎指正。

score 1 · Accepted Answer

查看 Paint.NET 中的 LocalHistogramEffect 代码，尤其是 LocalHistorgramEffect.RenderRect。

我遍历输入图像，维护每个源像素的强度直方图，其中包含目标像素的“r”个像素。当遍历输出像素时，它将前沿添加到直方图并减去后沿。它可以很好地处理所有边缘情况，而且速度非常快。它是 Median、Unfocus、Outline 和 Remove Noise 效果的基础。

调整它以支持 Hue 而不是 RGB 强度将是相当微不足道的。

性能非常好，对于您的目的，它在 O(r^2+w r+n w) 中运行，其中 r 是半径，w 是图像的宽度，n 是直方图中的级别数.

-tjackson

score 0 · Accepted Answer

迈克尔击败了我，但我也会这样做，如下所示：

        int MaxValueIn2dArray(int[,] matrix)
    {
        var d = new int[360];
        int MaxValue = 0;
        for (int x = 0; x <= matrix.GetUpperBound(0); x++)
        {
            for (int y = 0; y <= matrix.GetUpperBound(1); y++)
            {
                d[matrix[x, y]]++;
            }
        }
        foreach (int value in d)
        {
            if (value > MaxValue) MaxValue = value;
        }
        return MaxValue;
    }

它需要针对您的特定需求进行优化。

score 0 · Accepted Answer

我将提供的只是检查每个单元格的任何算法（这几乎是您期望做的）做两件额外的事情：

1.) 确保在当前最常用值的计数 > (M x N / 2) 时退出例程。如果某些东西在您的网格上具有 >50% 的覆盖率，那么它是最常见的值，无需继续。如果您的例程只需要在大多数情况下正确，那么您可以降低百分比并将其视为启发式。您甚至可以运行一些分析，例如如果覆盖率 > 37.6%，那么 99.9% 的时间它将是最常见的值，然后使用该百分比。

2.) 如果有任何方法可以确定最常见的值可能在哪一侧、角落或一般位置（外边缘、中间等），然后您可以按照上面的优化 1 的顺序进行扫描可以减少您的大量扫描。例如，在您的示例中，右上角的公共值很重。如果这可以通过某种启发式方法确定，您可以以某种方式从右上角扫描到左下角。如果所需的扫描模式很复杂，请预先生成它。

c# - 如何在二维整数数组中找到最常见的整数？

8 回答 8

Related

Reference