c++ - 在数据流处理程序中获取段错误

Question

我正在编写一个程序，用于处理对新节点和边的图的批量更新。我最近合并了一个滑动窗口方案，用于检查图形中已经存在的边是否在窗口中，如果没有则删除它们。我正在使用 Edge 和 Node 类，如下所示：

class Edge
{
public:
 uint64_t source;
 uint64_t target;
 unsigned type;
 std::string label;
 uint64_t timestamp;
 bool directed;
 bool extracted;
 Edge(){}
 Edge(Edge *e);
 Edge(uint64_t, uint64_t, unsigned, std::string, time_t, bool);
 bool operator ==(const Edge *other)
  {
  return((this->source==other->source)&&(this->target==other->target)&& \
          (this->type==other->type));
  }
};   

class Node
{
  public:
  uint64_t id;
  unsigned type;
  std::string label;
  uint64_t timestamp;
  std::vector<Edge *> adjacent_edges;
  Node(){}
  Node(Node *);
  bool push_edge(Edge *e)
  {
    try
    {
     adjacent_edges.push_back(e);
    }
    catch(std::bad_alloc&)
    {
     std::cout<<"Error pushing edge"<<std::endl;
     return false;
    }
    return true;    
   }

   std::vector<Edge *>::iterator pop_edge(std::vector<Edge *>::iterator e_it)
   {
    return adjacent_edges.erase(e_it);
   }

  bool operator ==(const Node *other)
   {
   return (this->id == other->id);
   }
};

在使用一个数据集时，在尝试使用边缘迭代器访问边缘时，我在处理 69 个滑动窗口大小为 5 的批处理文件后遇到了段错误。在使用另一个数据集时，我在尝试删除邻接列表中的非空 Edge 指针（尝试释放内存）时，在 69 个批处理文件后出现段错误。我束手无策，试图弄清楚出了什么问题。该程序的非滑动窗口版本运行良好。我也知道使用 STL deque 数据结构对于滑动窗口会更好。但是，我正在使用相当大的代码，我希望能够在不使用双端队列的情况下解决这个问题。提前致谢。编辑：它发生在两条不同的线路上：

  for (int i = 0; i < node_list.size(); i++)
  {
  vector<Edge *>::iterator adj_it;

  for (adj_it = (node_list[i])->adjacent_edges.begin(); adj_it != (node_list[i])->adjacent_edges.end(); ++adj_it )
  {


      if ((max_batch_num - (*adj_it)->timestamp) > time_window)
      {


          deleteEdge(adj_it);
          num_edges_deleted++;
          --adj_it;

      }

  }

}

它发生在线：

if ((max_batch_num - (*adj_it)->timestamp) > time_window)

关于使用第一个数据集。这里的问题是，即使向量不是空的，向量中的指针指向的内存也不是应用程序的一部分。当我使用 gdb 尝试打印时：

print (*adj_it)->timestamp

它给出：尝试获取不在内存中的值的地址

尽管边被添加到邻接列表中，但这不应该发生。在使用第二个数据集时，我使用时会发生错误：

delete (*adj_it);

其中 adj_it 是 adjacency_list 向量的迭代器。

同样奇怪的是，如果我通过说“n”来增加滑动窗口，同样的问题会在“n”个批次之后发生。

添加 deleteEdge 函数：

vector<FSM::Edge *>::iterator FSM::Graph::deleteEdge(vector<Edge *>::iterator e_it)
{
 //cout<<"Deleting edge: e "<<e->source<<" -> "<<e->target<<endl;//DEBUG
 FSM::Node *s = getNode((*e_it)->source);
 FSM::Edge *e_tmp = (*e_it);
 e_it = s->pop_edge(e_it);
 if (e_tmp != NULL)
 {
     delete e_tmp;
 }
 else
 {
     std::cerr<<"Trying to delete an Edge pointer which is NULL"<<endl;
     exit(1);
 }
 return e_it;

}

我以前也只使用过索引，在@Julius 的回答之后我又试了一次。这是我的新删除循环。

for (int j = 0; j<(node_list[i])->adjacent_edges.size();j++)
   {
       if ((max_batch_num - ((node_list[i])->adjacent_edges[j])->timestamp) > time_window)
               {                     
                   (node_list[i])->adjacent_edges.erase((node_list[i])->adjacent_edges.begin() + j);
                  --j;
                  num_edges_deleted++;

              }

  }

但是，无论如何，我都会遇到相同的错误。

顺便提一句。我真的很感谢到目前为止的所有评论。感谢您的时间。

编辑：使用 valgrind 在代码的不同部分发现内存泄漏。摆脱那个代码（它对算法来说并不是真正必要的）摆脱了它。我接受@Julius 的回答，因为根据我原来的陈述它可以解决问题。还要感谢@RetiredNinja、@Beta 和@Golazo 的出色评论。

score 0 · Accepted Answer

for (int i = 0; i < node_list.size(); i++)
  {
  vector<Edge *>::iterator adj_it;

  for (adj_it = (node_list[i])->adjacent_edges.begin(); adj_it != (node_list[i])->adjacent_edges.end(); ++adj_it )
  {


      if ((max_batch_num - (*adj_it)->timestamp) > time_window)
      {


          deleteEdge(adj_it);
          num_edges_deleted++;
          --adj_it;

      }

  }

}

您正在删除边缘，然后使用--adj_it 向后移动，然后在您刚刚使用 deleteEdge 删除的边缘上迭代，因为 for 循环执行 ++adj_it。然后，您尝试检查已删除（无效）Edge 对象的时间戳对象，从而导致段错误。

要么，要么您从 Edge* 向量中删除对象，然后使您的迭代器无效。

重要的部分是迭代器不是索引。您不能只擦除一个元素然后执行 --adj_it。这是使用索引更容易的情况，因为您可以删除边缘对象，从向量中删除边缘指针，然后在执行 --adj_it 之后继续循环。迭代器只是比向量上的索引慢。

c++ - 在数据流处理程序中获取段错误

1 回答 1

Related

Reference