3

Consider the following (simplified) scenario:

class edgeOne {
  private:
    ...
  public:
    int startNode();
    int endNode();
};

class containerOne {
  private:
    std::vector<edgeOne> _edges;
  public:
    std::vector<edgeOne>::const_iterator edgesBegin(){
      return _edges.begin();
    };
    std::vector<edgeOne>::const_iterator edgesEnd(){
      return _edges.end();
    };
};

class edgeTwo {
  private:
    ...
  public:
    int startNode();
    int endNode();
};

class containerTwo {
  private:
    std::vector<edgeTwo> _edges;
  public:
    std::vector<edgeTwo>::const_iterator edgesBegin(){
      return _edges.begin();
    };
    std::vector<edgeTwo>::const_iterator edgesEnd(){
      return _edges.end();
    };
};

I.e., I have two mostly identical edge types and two mostly identical container types. I can iterate over each kind individually. So far, so fine.

But now my use case is the following: Based on some criteria, I get either a containerOne or a containerTwo object. I need to iterate over the edges. But because the types are different, I cannot easily do so without code duplication.

So my idea is the following: I want to have an iterator with the following properties: - Regarding its traversal behavior, it behaves either like a std::vector<edgeOne>::const_iterator or a std::vector<edgeTwo>::const_iterator, depending on how it was initialized. - Instead of returning a const edgeOne & or const edgeTwo &, operator* should return a std::pair<int,int>, i.e., apply a conversion.

I found the Boost.Iterator Library, in particular:

  • iterator_facade, which helps to build a standard-conforming iterator and
  • transform_iterator, which could be used to transform edgeOne and edgeTwo to std::pair<int,int>, but I am not completely sure how the complete solution should look like. If I build nearly the entire iterator myself, is there any benefit to use transform_iterator, or will it just make the solution more heavy-weight?

I guess the iterator only needs to store the following data:

  • A flag (bool would be sufficient for the moment, but probably an enum value is easier to extend if necessary) indicating whether the value type is edgeOne or edgeTwo.
  • A union with entries for both iterator types (where only the one that matches the flag will ever be accessed).

Anything else can be computed on-the-fly.

I wonder if there is an existing solution for this polymorphic behavior, i.e. an iterator implementation combining two (or more) underlying iterator implementation with the same value type. If such a thing exists, I could use it to just combine two transform_iterators.

Dispatching (i.e., deciding whether a containerOne or a containerTwo object needs to be accessed) could easily be done by a freestanding function ...

Any thoughts or suggestions regarding this issue?

4

3 回答 3

3

How about making your edgeOne & edgeTwo polymorphic ? and using pointers in containers ?

class edge
class edgeOne : public edge
class edgeTwo : public edge

std::vector<edge*>
于 2013-07-18T11:06:40.797 回答
2

Based on some criteria, I get either a containerOne or a containerTwo object. I need to iterate over the edges. But because the types are different, I cannot easily do so without code duplication.

By "code duplication", do you mean source code or object code? Is there some reason to go past a simple template? If you're concerned about the two template instantiations constituting "code duplicaton" you can move most of the processing to an out-of-line non-templated do_whatever_with(int, int) support function....

template <typename Container>
void iterator_and_process_edges(const Container& c)
{
    for (auto i = c.edgesBegin(); i != c.edgesEnd(); ++i)
        do_whatever_with(i->startNode(), i->endNode());
}

if (criteria)
    iterate_and_process_edges(getContainerOne());
else
    iterate_and_process_edges(getContainerTwo());

My original aim was to hide the dispatching functionality from the code that just needs to access the start and end node. The dispatching is generic, whereas what happens inside the loop is not, so IMO this is a good reason to separate both.

Not sure I follow you there, but I'm trying. So, "code that just needs to access the start and end node". It's not clear whether by access you mean to get startNode and endNode for a container element, or to use those values. I'd already factored out a do_whatever_with function that used them, so by elimination your request appears to want to isolate the code extracting the nodes from an Edge - done below in a functor:

template <typename Edge>
struct Processor
{
    void operator()(Edge& edge) const
    {
        do_whatever_with(edge.startNode(), edge.endNode());
    }
};

That functor can then be applied to each node in the Container:

template <typename Container, class Processor>
void visit(const Container& c, const Processor& processor)
{
    for (auto i = c.edgesBegin(); i != c.edgesEnd(); ++i)
        processor(*i);
}

"hide the dispatching functionality from the code that just needs to access the start and end node" - seems to me there are various levels of dispatching - on the basis of criteria, then on the basis of iteration (every layer of function call is a "dispatch" in one sense) but again by elimination I assume it's the isolation of the iteration as above that you're after.

if (criteria)
    visit(getContainerOne(), Process<EdgeOne>());
else
    visit(getContainerTwo(), Process<EdgeTwo>());

The dispatching is generic, whereas what happens inside the loop is not, so IMO this is a good reason to separate both.

Can't say I agree with you, but then it depends whether you can see any maintenance issue with my first approach (looks dirt simple to me - a layer less than this latest incarnation and considerably less fiddly), or some potential for reuse. The visit implementation above is intended to be reusable, but reusing a single for-loop (that would simplify further if you have C++11) isn't useful IMHO.

Are you more comfortable with this modularisation, or am I misunderstanding your needs entirely somehow?

于 2013-07-18T11:15:19.720 回答
1
template<typename T1, typename T2>
boost::variant<T1*, T2*> get_nth( boost::variant< std::vector<T1>::iterator, std::vector<T2>::iterator > begin, std::size_t n ) {
  // get either a T1 or a T2 from whichever vector you actually have a pointer to
}

// implement this using boost::iterator utilities, I think fascade might be the right one
// This takes a function that takes an index, and returns the nth element.  It compares for
// equality based on index, and moves position based on index:
template<typename Lambda>
struct generator_iterator {
  std::size_t index;
  Lambda f;
};
template<typename Lambda>
generator_iterator< typename std::decay<Lambda>::type > make_generator_iterator( Lambda&&, std::size_t index=0 );

boost::variant< it1, it2 > begin; // set this to either one of the begins
auto double_begin = make_generator_iterator( [begin](std::size_t n){return get_nth( begin, n );} );
auto double_end = double_begin + number_of_elements; // where number_of_elements is how big the container we are type-erasing

Now we have an iterator that can iterate over one, or the other, and returns a boost::variant<T1*, T2*>.

We can then write a helper function that uses a visitor to extract the two fields you want from the returned variant, and treat it like an ADT. If you dislike ADTs, you can instead write up a class that wraps the variant and provides methods, or even change the get_nth to be less generic and instead return a struct with your data already produced.

There is going to be the equivalent of a branch on each access, but there is no virtual function overhead in this plan. It does currently requires an auto typed variable, but you can write an explicit functor to replace the lambda [begin](std::size_t n){return get_nth( begin, n );} and even that issue goes away.

Easier solutions:

Write a for_each function that iterates over each of the containers, and passes in the processed data to the passed in function.

struct simple_data { int x,y; };
std::function<std::function<void(simple_data)>> for_each() const {
  auto begin = _edges.begin();
  auto end = _edges.end();
  return [begin,end](std::function<void(simple_data)> f) {
    for(auto it = begin; it != end; ++it) {
      simple_data d;
      d.x = it->getX();
      d.y = it->getY();
      f(d);
    }
  };
};

and similar in the other. We now can iterate over the contents without caring about the details by calling foo->foreach()( [&](simple_data d) { /*code*/ } );, and because I stuffed the foreach into a std::function return value instead of doing it directly, we can pass the concept of looping around to another function.

As mentioned in comments, other solutions can include using boost's type-erased iterator wrappers, or writing a python-style generator mutable generator that returns either a simple_data. You could also directly use the boost iterator-creation functions to create an iterator over boost::variant<T1, T2>.

于 2013-07-18T12:52:57.277 回答