c++ - 用于选择机器组的数据结构

Question

我有这个旧的批处理系统。调度程序将所有计算节点存储在一个大数组中。现在大部分都可以了，因为大多数查询都可以通过过滤满足查询的节点来解决。

我现在遇到的问题是，除了一些基本属性（cpu 数量、内存、操作系统）之外，还有这些奇怪的分组属性（城市、infiniband、网络暂存）。

现在这些问题是，当用户请求具有 infiniband 的节点时，我不能只给他任何节点，但我必须给他连接到一个 infiniband 交换机的节点，以便节点实际上可以使用 infiniband 进行通信。

这仍然可以，当用户只请求一个这样的属性时（我可以为每个属性分区数组，然后尝试分别选择每个分区中的节点）。

问题来自于组合多个这样的属性，因为那样我将不得不生成子集（主数组的分区）的所有组合。

好消息是大多数属性都处于子集或等价关系中（对于一个 infiniband 交换机上的机器在一个城市中是有意义的）。但不幸的是，这并不完全正确。

是否有一些好的数据结构来存储这种半分层的大部分树状的东西？

编辑：示例

node1 : city=city1, infiniband=switch03, networkfs=server01
node2 : city=city1, infiniband=switch03, networkfs=server01
node3 : city=city1, infiniband=switch03
node4 : city=city1, infiniband=switch03
node5 : city=city2, infiniband=switch03, networkfs=server02
node6 : city=city2, infiniband=switch03, networkfs=server02
node7 : city=city2, infiniband=switch04, networkfs=server02
node8 : city=city2, infiniband=switch04, networkfs=server02

用户要求：

2x node with infiniband and networkfs

所需的输出将是：(node1, node2)或(node5,node6)或(node7,node8)。

在一个好的情况下，这个例子不会发生，但在某些情况下，我们实际上有这些奇怪的跨站点连接。如果 in 中的节点city2都在 infiniband 上switch04，那将很容易。不幸的是，现在我必须生成具有相同 infiniband 交换机和相同网络文件系统的节点组。

实际上问题要复杂得多，因为用户不需要请求整个节点，而且属性很多。

编辑：为查询添加了所需的输出。

score 3 · Accepted Answer

假设您有 p 个分组属性和 n 个机器，基于存储桶的解决方案是最容易设置的，并且提供 O(2 ^p ·log(n)) 访问和更新。

您为每组属性创建一个桶堆（因此您将有一个用于“infiniband”的桶堆，一个用于“networkfs”的桶堆和一个用于“infiniband × networkfs”的桶堆）——这意味着 2 ^p桶堆。
每个桶堆包含一个用于每个值组合的桶（因此“infiniband”桶将包含一个用于键“switch04”的桶和一个用于键“switch03”的桶） - 这意味着总共最多 n·2 ^p个桶拆分跨越所有桶堆。
每个存储桶都是服务器列表（可能分为可用和不可用）。bucket-heap 是一个标准堆（请参阅参考资料std::make_heap），其中每个桶的值是该桶中可用服务器的数量。
每个服务器都存储对包含它的所有存储桶的引用。
当您查找与某个属性组匹配的服务器时，您只需在该属性组的相应存储桶中查找，然后沿着堆向下查找一个足够大的存储桶以容纳所请求的服务器数量。这需要 O(log(p)·log(n))。
当服务器被标记为可用或不可用时，您必须更新包含这些服务器的所有存储桶，然后更新包含这些存储桶的存储桶堆。这是一个 O(2 ^p ·log(n)) 操作。

如果您发现自己拥有太多属性（并且 2 ^p变得失控），该算法允许从其他桶堆按需构建一些桶堆：如果用户请求“infiniband × networkfs”但您只有一个可用于“infiniband”或“networkfs”的桶堆，您可以将“infiniband”桶堆中的每个桶单独变成一个桶堆（使用惰性算法，因此您不必处理如果第一个有效，则所有存储桶）并使用惰性堆合并算法找到合适的存储桶。然后，您可以使用 LRU 缓存来决定存储哪些属性组以及按需构建哪些属性组。

score 0 · Accepted Answer

我的猜测是，不会有一个“简单、高效”的算法和数据结构来解决这个问题，因为你所做的类似于求解一组联立方程。假设总共有 10 个类别（如和） city，并且用户为其中的 3 个指定了所需的值。假设用户要求 5 个节点。然后，您的任务是推断剩余 7 个类别的值，以便至少存在 5 个记录，其所有 10 个类别字段都等于这些值（指定的 3 个和推断的 7 个）。可能有多种解决方案。infinibandnetwork

尽管如此，只要没有太多不同的类别，并且每个类别中没有太多不同的可能性，您可以进行简单的暴力递归搜索以找到可能的解决方案，在每个递归级别您考虑一个特定类别，并且“尝试”每一种可能性。假设用户要求记录，并且可以通过、等k选择规定任意数量的要求：required_cityrequired_infiniband

either(x, y) := if defined(x) then [x] else y

For each city c in either(required_city, [city1, city2]):
  For each infiniband i in either(required_infiniband, [switch03, switch04]):
    For each networkfs nfs in either(required_nfs, [undefined, server01, server02]):
      Do at least k records of type [c, i, nfs] exist?  If so, return them.

该either()函数只是一种将搜索限制在包含用户为其提供约束的点的子空间的方法。

基于此，您将需要一种方法来快速查找任何给定[c, i, nfs]组合的点（行）数——嵌套哈希表可以很好地解决这个问题。

score 0 · Accepted Answer

第 1 步：为每个属性创建一个索引。例如，对于每个属性+值对，创建具有该属性的节点的排序列表。将每个这样的列表放入某种关联数组中 - 这类似于 stl 映射，每个属性一个，由值索引。这样，当您完成后，您将拥有一个近乎恒定的时间函数，该函数可以向您返回与单个属性+值对匹配的节点列表。该列表仅按节点编号排序。

第 2 步：给定一个查询，对于所需的每个属性+值对，检索节点列表。

第 3 步：从最短的列表开始，将其称为列表 0，将其与其他每个列表进行比较，依次从列表 0 中删除不在其他列表中的元素。

您现在应该只拥有具有请求的所有属性的节点。

您的另一个选择是使用数据库，它已经设置为支持这样的查询。它可以通过带有 SQL 扩展的 BerkeleyDB 之类的东西在内存中完成。

score -1 · Accepted Answer

如果按查询中提到的每个标准对列表进行排序是可行的（或者让列表按每个相关标准预先排序），那么这非常有效。

“相对标准”是指不是“x 必须为 5”形式的标准，过滤起来很简单，但“x 必须与结果集中的每个项目相同”形式的标准。如果还有“x 必须为 5”形式的条件，则首先过滤这些条件，然后执行以下操作。

它依赖于对多个列使用稳定的排序来快速找到匹配的组（无需尝试组合）。

复杂性是节点数 * 查询中的条件数（对于算法本身）+ 节点数 * log（节点数）* 条件数（对于排序，如果不是预排序）。所以节点*日志（节点）*标准。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace bleh
{
    class Program
    {
        static void Main(string[] args)
        {
            List<Node> list = new List<Node>();

            // create a random input list
            Random r = new Random();
            for (int i = 1; i <= 10000; i++)
            {
                Node node = new Node();
                for (char c = 'a'; c <= 'z'; c++) node.Properties[c.ToString()] = (r.Next() % 10 + 1).ToString();
                list.Add(node);
            }

            // if you have any absolute criteria, filter the list first according to it, which is very easy
            // i am sure you know how to do that

            // only look at relative criteria after removing nodes which are eliminated by absolute criteria
            // example 
            List<string> criteria = new List<string> {"c", "h", "r", "x" };
            criteria = criteria.OrderBy(x => x).ToList();

            // order the list by each relative criteria, using a ***STABLE*** sort
            foreach (string s in criteria)
                list = list.OrderBy(x => x.Properties[s]).ToList();

            // size of sought group
            int n = 4;

            // this is the algorithm
            int sectionstart = 0;
            int sectionend = 0;
            for (int i = 1; i < list.Count; i++)
            {
                bool same = true;
                foreach (string s in criteria) if (list[i].Properties[s] != list[sectionstart].Properties[s]) same = false;
                if (same == true) sectionend = i;
                else sectionstart = i;
                if (sectionend - sectionstart == n - 1) break;
            }

            // print the results
            Console.WriteLine("\r\nResult:");
            for (int i = sectionstart; i <= sectionend; i++)
            {
                Console.Write("[" + i.ToString() + "]" + "\t");
                foreach (string s in criteria) Console.Write(list[i].Properties[s] + "\t");
                Console.WriteLine();
            }
            Console.ReadLine();
        }
    }
}

score -1 · Accepted Answer

我会做这样的事情（显然你应该将它们映射到int，而不是字符串，并使用int作为代码）

struct structNode
{
    std::set<std::string> sMachines;
    std::map<std::string, int> mCodeToIndex;    
    std::vector<structNode> vChilds;        
};

void Fill(std::string strIdMachine, int iIndex, structNode* pNode, std::vector<std::string> &vCodes)
{
    if(iIndex < vCodes.size())
    {           
        // Add "Empty" if Needed
        if(pNode->vChilds.size() == 0)
        {
            pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair("empty", 0));
            pNode->vChilds.push_back(structNode());
        }

        // Add for "Empty"
        pNode->vChilds[0].sMachines.insert(strIdMachine);
        Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[0], vCodes );

        if(vCodes[iIndex] == "empty")
            return;


        // Add for "Any"        
        std::map<std::string, int>::iterator mIte = pNode->mCodeToIndex.find("any");
        if(mIte == pNode->mCodeToIndex.end())
        {
            mIte = pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair("any", pNode->vChilds.size()));
            pNode->vChilds.push_back(structNode());     
        }

        pNode->vChilds[mIte->second].sMachines.insert(strIdMachine);
        Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[mIte->second], vCodes );


        // Add for "Segment"
        mIte = pNode->mCodeToIndex.find(vCodes[iIndex]);
        if(mIte == pNode->mCodeToIndex.end())
        {
            mIte = pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair(vCodes[iIndex], pNode->vChilds.size()));
            pNode->vChilds.push_back(structNode());                 
        }

        pNode->vChilds[mIte->second].sMachines.insert(strIdMachine);
        Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[mIte->second], vCodes );

    }
}

//////////////////////////////////////////////////////////////////////
// Get
//
// NULL on empty group
//////////////////////////////////////////////////////////////////////
set<std::string>* Get(structNode* pNode, int iIndex, vector<std::string> vCodes, int iMinValue)
{       
    if(iIndex < vCodes.size())
    {       
        std::map<std::string, int>::iterator mIte = pNode->mCodeToIndex.find(vCodes[iIndex]);       
        if(mIte != pNode->mCodeToIndex.end())
        {
            if(pNode->vChilds[mIte->second].sMachines.size() < iMinValue)
                return NULL;
            else
                return Get(&pNode->vChilds[mIte->second], (iIndex + 1), vCodes, iMinValue);
        }
        else
            return NULL;        
    }

    return &pNode->sMachines;   
}

用您的样本填充树

structNode stRoot;

    const char* dummy[] = { "city1", "switch03", "server01" };  
    const char* dummy2[] = { "city1", "switch03", "empty" };
    const char* dummy3[] = { "city2", "switch03", "server02" };
    const char* dummy4[] = { "city2", "switch04", "server02" };

    // Fill the tree with the sample    
    Fill("node1", 0, &stRoot, vector<std::string>(dummy, dummy + 3));
    Fill("node2", 0, &stRoot, vector<std::string>(dummy, dummy + 3));
    Fill("node3", 0, &stRoot, vector<std::string>(dummy2, dummy2 + 3));
    Fill("node4", 0, &stRoot, vector<std::string>(dummy2, dummy2 + 3));
    Fill("node5", 0, &stRoot, vector<std::string>(dummy3, dummy3 + 3));
    Fill("node6", 0, &stRoot, vector<std::string>(dummy3, dummy3 + 3));
    Fill("node7", 0, &stRoot, vector<std::string>(dummy4, dummy4 + 3));
    Fill("node8", 0, &stRoot, vector<std::string>(dummy4, dummy4 + 3));

现在您可以轻松获得所需的所有组合，例如您查询的内容如下：

vector<std::string> vCodes;
    vCodes.push_back("empty"); // Discard first property (cities)
    vCodes.push_back("any");   // Any value for infiniband
    vCodes.push_back("any");   // Any value for networkfs (except empty)

    set<std::string>* pMachines = Get(&stRoot, 0, vCodes, 2);

例如，只有交换机 03 上的 City02 网络不为空

vector<std::string> vCodes;
    vCodes.push_back("city2"); // Only city2
    vCodes.push_back("switch03");   // Only switch03
    vCodes.push_back("any");   // Any value for networkfs (except empy)

    set<std::string>* pMachines = Get(&stRoot, 0, vCodes, 2);

c++ - 用于选择机器组的数据结构

5 回答 5

Related

Reference