arrays - 在二维数组中创建相似元素集

Question

我正在尝试解决基于二维数组的问题。该数组包含不同种类的元素（共有 3 种可能的种类）。让我们假设类型为 X、Y、Z。

该数组似乎是这样的。请注意，它总是会被完全填满。该图用于说明。

7 | | | | | | |
6 | | | | | | |
5 | | | | | | |
4 | |X|Z|Y|X| |
3 | |Y|X|Y|Y|X|
2 |Y|Y|X|Z|Z|X|
1 |X|X|Y| |X|X|
0 | | | |Z| | |
   0 1 2 3 4 5

我正在尝试创建彼此相邻放置的元素集。例如，set1 可能包含位于以下位置的 X 类型元素：(0,1)、(1,1)、(2,2)、(2,3)、(1,4)。类似地，set2 可能包含位于 (3,4)、(3,3)、4,3) 的 Y 类型元素。

问题：给定数组中的任何点，它必须能够将所有元素添加到适当的集合中，并确保没有两个集合包含相同的元素。请注意，仅当遇到超过 2 个相同类型的相邻元素时才会创建一个集合。

此外，如果删除了某个元素子集，则会添加更多元素来替换删除的元素。然后必须重新迭代该数组以创建新集合或修改现有集合。

解决方案：我实现了一个递归解决方案，它会遍历所有相邻元素，例如元素 X (0,1)。然后，在迭代 8 个可能的相邻元素时，它会在出现类型 X 时递归调用自身。

这种解决方案过于暴力且效率低下，尤其是在某些元素被可能不同类型的新元素替换的情况下。在这种情况下，几乎整个数组都必须重新迭代以制作/修改集合，并确保在多个集合中不存在相同的元素。

有没有什么算法可以有效地处理这类问题？我需要一些想法/建议或伪代码的帮助。

score 7 · Accepted Answer

[编辑 5/8/2013：固定时间复杂度。（O（a（n））本质上是恒定的时间！）]

在下文中，“连接组件”是指通过仅允许在具有相同类型元素的相邻位置之间水平、垂直或对角线移动的路径彼此可到达的所有位置的集合。例如，您的示例{(0,1), (1,1), (2,2), (2,3), (1,4)}是示例输入中的连接组件。每个位置恰好属于一个连通分量。

我们将构建一个联合/查找数据结构，该结构将用于给每个位置 (x, y) 一个数字“标签”，该标签具有以下属性：当且仅当任何两个位置 (x, y) 和 (x', y' ) 属于同一组件，则它们具有相同的标签。特别是这个数据结构支持三种操作：

set(x, y, i)将位置 (x, y) 的标签设置为 i。
find(x, y)将返回分配给位置 (x, y) 的标签。
union(Z)，对于某些标签集 Z，将把 Z 中的所有标签组合成一个标签 k，从某种意义上说，未来find(x, y)对先前在 Z 中具有标签的任何位置 (x, y) 的调用现在将返回 k。（通常 k 将是 Z 中已经存在的标签之一，尽管这实际上并不重要。） union(Z)还返回新的“主”标签 k。

如果总共有 n = 宽度 * 高度位置，这可以在 O(n*a(n)) 时间内完成，其中 a() 是极其缓慢增长的反阿克曼函数。 对于所有实际输入大小，这与 O(n) 相同。

请注意，只要两个顶点彼此相邻，就有四种可能的情况：

一个在另一个之上（由垂直边缘连接）
一个在另一个的左侧（由水平边缘连接）
一个在另一个的上方和左侧（由\对角线连接）
一个在另一个的上方和右侧（由/对角线连接）

我们可以使用以下 pass 来确定每个位置 (x, y) 的标签：

将 nextLabel 设置为 0。
对于每一行 y 以递增顺序：
- 对于按升序排列的每一列 x：
  - 检查 (x, y) 的 W、NW、N 和 NE 邻居。令 Z 是这 4 个与 (x, y) 同类的邻居的子集。
  - 如果 Z 是空集，那么我们假设 (x, y) 开始一个全新的组件，因此调用 set(x, y, nextLabel) 并递增 nextLabel。
  - 否则，对 Z 的每个元素调用 find(Z[i]) 以找到它们的标签，并在这组标签上调用 union() 将它们组合在一起。将新标签（此 union() 调用的结果）分配给 k，然后还调用 set(x, y, k) 将 (x, y) 添加到此组件。

在此之后，调用find(x, y)任何位置 (x, y) 可以有效地告诉您它属于哪个组件。如果您希望能够快速回答“哪些位置属于包含位置 (x, y) 的连通分量？”形式的查询？然后创建一个列表哈希表posInComp并对输入数组进行第二次传递，将每个 (x, y) 附加到列表中posInComp[find(x, y)]。这一切都可以在线性时间和空间内完成。现在要回答某个给定位置 (x, y) 的查询，只需调用lab = find(x, y)查找该位置的标签，然后在中列出位置posInComp[lab]。

要处理“太小”的组件，只需查看posInComp[lab]. 如果是 1 或 2，则 (x, y) 不属于任何“足够大”的组件。

最后，所有这些工作实际上都需要线性时间，所以除非你的输入数组很大，否则它会快如闪电。所以在修改输入数组后从头开始重新计算是完全合理的。

score 1 · Accepted Answer

You may want to check out region growing algorithms, which are used for image segmentation. These algorithms start from a seed pixel and grow a contiguous region where all the pixels in the region have some property.

In your case adjacent 'pixels' are in the same image segment if they have the same label (ie, kind of element X, Y or Z)

score 1 · Accepted Answer

在您的情况下，我至少会依赖两个不同的数组：

Array1 (sets) -> all the sets and the associated list of points. Main indices: set names.
Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names.

可能可以创建更多支持数组，例如，一个包含每个集合的最小/最大 X/Y 值的数组，以加快分析速度（尽管无论如何它会非常快，如下所示）。

您没有提到任何编程语言，但我包含了一个示例（C#）代码，因为它是解释这一点的最佳方式。请不要将其理解为对最佳方法的建议（就我个人而言，我不太喜欢Dictionaries/ Lists；尽管认为这确实提供了一种很好的图形方式来显示算法，即使对于没有经验的 C# 用户也是如此）。此代码仅旨在显示数据存储/检索方法；实现最佳性能的最佳方式取决于目标语言和其他问题（例如，数据集大小），这是您必须注意的事情。

Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points
Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y)

List<Point> temp0 = new List<Point>();
temp0.Add(new Point(0, 0));
temp0.Add(new Point(0, 1));
sets.Add("Set1", temp0);
List<String> tempX = new List<string>();
tempX.Add("Set1");

temp0 = new List<Point>();
temp0.Add(new Point(0, 2));
temp0.Add(new Point(1, 2));
sets.Add("Set2", temp0);
List<String> tempY = new List<string>();
tempY.Add("Set2");

setsDef.Add("X", tempX);
setsDef.Add("Y", tempY);


//-------- TEST
//I have a new Y value which is 2,2
Point targetPoint = new Point(2, 2);
string targetSet = "Y";

//I go through all the Y sets
List<string> targetSets = setsDef[targetSet];

bool alreadyThere = false;
Point candidatePoint;
string foundSet = "";
foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet
{
    List<Point> curPoints = sets[set];
    foreach (Point point in curPoints) //Going through all the points in the given set
    {
        if (point == targetPoint)
        {
            //Already-stored point and thus the analysis will be stopped
            alreadyThere = true;
            break;
        }
        else if (isSurroundingPoint(point, targetPoint))
        {
            //A close point was found and thus the set where the targetPoint has to be stored
            candidatePoint = point;
            foundSet = set;
            break;
        }
    }
    if (alreadyThere || foundSet != "")
    {
        break;
    }
}

if (!alreadyThere)
{
    if (foundSet != "")
    {
        //Point added to an existing set
        List<Point> curPoints = sets[foundSet];
        curPoints.Add(targetPoint);
        sets[foundSet] = curPoints;
    }
    else
    {
        //A new set has to be created
        string newName = "New Set";
        temp0 = new List<Point>();
        temp0.Add(targetPoint);
        sets.Add(newName, temp0);

        targetSets.Add(newName);
        setsDef[targetSet] = targetSets;
    }
}

isSurroundingPoint检查两个点是否彼此靠近的函数在哪里：

private bool isSurroundingPoint(Point point1, Point point2)
{
    bool isSurrounding = false;
    if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1)
    {
        if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1)
        {
            isSurrounding = true;
        }
    }
    return isSurrounding;
}

score 0 · Accepted Answer

我写了一些东西来为另一个 SO 问题找到一种类型的对象。下面的示例添加了另外两种类型。任何重复都会再次检查整个列表。这个想法是分别处理每种类型的点列表。该函数solve对所有连接点进行分组，并在枚举下一组之前将它们从列表中删除。areConnected检查点坐标之间的关系，因为我们只测试一种类型的点。在这个通用版本中，类型 ( a b c) 可以是任何东西（字符串、数字、元组等），只要它们匹配即可。

顺便说一句 - 这是 j_random_hacker 出色算法的 JavaScript 示例的链接：http: //jsfiddle.net/groovy/fP5kP/

哈斯克尔代码：

import Data.List (elemIndices, delete) 

example = ["xxyyyz"
          ,"xyyzzz"
          ,"yxxzzy"
          ,"yyxzxy"
          ,"xyzxyy"
          ,"xzxxzz"
          ,"xyzyyz"
          ,"xyzxyy"]

objects a b c ws = [("X",solve xs []),("Y",solve ys []),("Z",solve zs [])] where
  mapIndexes s = 
    concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws)
  [xs,ys,zs] = map mapIndexes [a,b,c]
  areConnected (y,x) (y',x') = abs (x-x') < 2 && abs (y-y') < 2
  solve []     r = r
  solve (x:xs) r =
    let r' = solve' xs [x]
    in solve (foldr delete xs r') (if null (drop 2 r') then r else r':r)
  solve' vs r =
    let ys = filter (\y -> any (areConnected y) r) vs
    in if null ys then r else solve' (foldr delete vs ys) (ys ++ r)

样本输出：

*Main> objects 'x' 'y' 'z' example
[("X",[[(7,0),(6,0),(5,0),(4,0)]
      ,[(3,4),(5,2),(5,3),(4,3),(2,2),(3,2),(2,1),(0,1),(1,0),(0,0)]])
,("Y",[[(7,5),(6,4),(7,4),(6,3)],[(4,4),(4,5),(3,5),(2,5)]
      ,[(4,1),(3,0),(3,1),(0,4),(2,0),(0,3),(1,1),(1,2),(0,2)]])
,("Z",[[(5,5),(6,5),(5,4)]
      ,[(7,2),(6,2),(5,1),(4,2),(3,3),(1,3),(2,3),(2,4),(1,4),(1,5),(0,5)]])]
(0.02 secs, 1560072 bytes)

arrays - 在二维数组中创建相似元素集

4 回答 4

Related

Reference