c++ - 如何在 std::vector 中查找重复项并返回一个按字母顺序排序的 std::list，在结果列表中没有重复项

Question

我有一个名为 Wordd 的类，它有一个成员 word_ 这是一个 std::list

我试图在那个 word_ 中找到重复项，并返回一个按字母顺序排列的列表，在返回的列表中没有重复项。到目前为止，我的代码编译和链接，但超时，可能是由于一些内部内存泄漏等。

class FindDuplicatesFunctor
{
public:
    std::list<std::string> list;
    std::vector<std::string> word_;
    FindDuplicatesFunctor(std::vector<std::string> words): list(0), word_(words){};
    void operator()(std::string const& str)
    {

        if(std::count(words_.begin(), words_.end(), str) > 1 && std::count(list.begin(), list.end(), str) == 0)
        {
            list.push_back(str);
        }
        list.sort();

    }
};
std::list<string> Wordd::FindDuplicates() const
{
    FindDuplicatesFunctor cf(word_);
    return std::for_each(words_.begin(), words_.end(), cf).list;
}

任何想法为什么它不执行其任务？

预先感谢您的帮助！

score 5 · Accepted Answer

编辑回应评论：

^{删除重复函数名称具有误导性，它实际上是试图返回在序列中重复的单词列表，但该结果列表中每个重复项只有一个副本 – user2624236 10 小时前}

我暗示了std::sort+ std::adjacent_find(... std::equal_to<>)。这是实现的：

template <typename C, typename T = typename C::value_type> std::list<T> adjacent_search(C input)
{
    std::sort(begin(input), end(input));

    static const auto eq = std::equal_to<T>{};
    static const auto neq= std::not2(eq);

    std::list<T> dupes;

    auto end_streak = begin(input);
    auto dupe_at    = std::adjacent_find(end_streak, end(input), eq);

    for(auto end_streak=begin(input);
        (dupe_at = std::adjacent_find(end_streak, end(input), eq)) != end(input);
        end_streak = std::adjacent_find(dupe_at, end(input), neq))
    {
        dupes.insert(dupes.end(), *dupe_at);
    }

    return dupes;
}

这个实现有几个不错的属性，例如线性扫描和合理的最坏情况行为（例如，如果输入包含单个值的 1000 个重复项，它不会进行 1001 次无用的搜索）。

但是，以下（使用集合）可能要简单得多：

// simple, but horrific performance
template <typename C, typename T = typename C::value_type> std::list<T> simple(C const& input)
{
    std::set<T> dupes; // optimization, dupes.find(x) in O(log n)
    for (auto it = begin(input); it != end(input); ++it)
    {
        if ((end(dupes) == dupes.find(*it))) // optimize by reducing find() calls
         && (std::count(it, end(input), *it) > 1))
        {
            dupes.insert(dupes.end(), *it);
        }
    }

    return {begin(dupes), end(dupes)};
}

这几乎肯定会在较小的集合上表现得更好，因为复制更少（结果除外）。由于在std::count.

我建议您std::set<T>直接返回，而不是将其复制到列表中。

这是在 Coliru 上运行Live的测试，显示了两个版本。

原始答案

现在已经过时了，因为它没有做 OP 想要的：

#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>

int main()
{
    std::vector<std::string> input = { "unsorted", "containing", "optional", "unsorted", "duplicate", "duplicate", "values" };

    std::sort(begin(input), end(input));

    std::unique_copy(begin(input), end(input), std::ostream_iterator<std::string>(std::cout, " "));

    std::cout << "\n";
}

输出：

containing duplicate optional unsorted values

现场观看：http ://coliru.stacked-crooked.com/view?id=f8cc78dbcce62ad276691b6541629a70-542192d2d8aca3c820c7acc656fa0c68

score 1 · Accepted Answer

函数FindDuplicates()引用word_和。words_看来，这两个名字应该是一样的，应该是哪一个，从代码片段中无法确定。

然而，使用的算法非常慢：它需要O(n * n)时间，可能会使用许多比向量操作更慢的列表操作。您肯定想使用与 sehe 发布的内容（std::sort()后跟std::unique_copy()）类似的方法。如果您的值集真的很大，您可能需要考虑只移动到该集一次并使用 a 来保留 a std::set<std::string>(or std::unordered_set<std::string>) 或 aa 版本std::string const*以确定该值是否已被看到。

score 1 · Accepted Answer

排序唯一擦除：

template<typename Container>
Container&& sort_unique_erase( Container&& c ) {
  using std::begin; using std::end;
  std::sort( begin(c), end(c) );
  c.erase( std::unique( begin(c), end(c) ), end(c) );
  return std::forward<Container>(c);
}

erase适用于您可以从 (vector和dequein namespace std)范围内的任何随机访问容器。

然后附加：

template<typename C1, typename C2>
C1&& append( C1&& c1, C2&& c2 ) {
  using std::begin; using std::end;
  c1.insert( end(c1), std::make_move_iterator( begin(c2) ), std::make_move_iterator( end(c2) ) );
  return std::forward<C1>(c1);
}
template<typename C1, typename C2>
C1&& append( C1&& c1, C2& c2 ) {
  using std::begin; using std::end;
  c1.insert( end(c1), begin(c2), end(c2) );
  return std::forward<C1>(c1);
}

并将它们绑在一起：

int main() {
  std::vector<std::string> words = {"hello", "world", "my", "name", "is", "hello"};
  std::list<std::string> retval;
  append( retval, sort_unique_erase( std::move(words) ) );
  for( auto& str : retval ) {
    std::cout << str << "\n";
  }
}

但是，std::list不建议使用：很少有理由过度使用它std::vector，或者在极少数情况下使用它std::deque。

c++ - 如何在 std::vector 中查找重复项并返回一个按字母顺序排序的 std::list，在结果列表中没有重复项

3 回答 3

原始答案

Related

Reference