我想知道std::variant
性能。我什么时候不应该使用它?看起来虚函数仍然比使用std::visit
这让我感到惊讶的要好得多!
在“A Tour of C++”中,Bjarne Stroustruppattern checking
在解释std::holds_alternatives
和overloaded
方法后这样说:
这基本上相当于一个虚函数调用,但可能更快。与所有性能声明一样,当性能至关重要时,应该通过测量来验证这种“可能更快”。对于大多数用途,性能差异是微不足道的。
我已经对我想到的一些方法进行了基准测试,结果如下:
http://quick-bench.com/N35RRw_IFO74ZihFbtMu4BIKCJg
如果你开启优化,你会得到不同的结果:
http://quick-bench.com/p6KIUtRxZdHJeiFiGI8gjbOumoc
这是我用于基准测试的代码;我确信有更好的方法来实现和使用变体来使用它们而不是虚拟关键字(继承与 std::variant):
删除旧代码;看看更新
谁能解释一下实现这个用例的最佳方法是什么std::variant
让我进行测试和基准测试:
我目前正在实现RFC 3986,它是“URI”,对于我的用例,这个类将更多地用作 const,可能不会有太多改变,用户更有可能使用这个类来查找每个特定的URI 的一部分,而不是制作一个 URI;std::string_view
因此使用而不是单独分隔 URI 的每个段是有意义的std::string
。问题是我需要为它实现两个类;当我只需要一个 const 版本时;另一个用于当用户想要创建 URI 而不是提供一个并通过它进行搜索时。
所以我用 atemplate
来修复它自己的问题;但后来我意识到我可以使用std::variant<std::string, std::string_view>
(或者也许std::variant<CustomStructHoldingAllThePieces, std::string_view>
);所以我开始研究看看它是否真的有助于使用变体。从这些结果来看,这似乎是使用继承,virtual
如果我不想实现两个不同的const_uri
和uri
类,这是我最好的选择。
你觉得我应该怎么做?
更新 (2)
感谢@gan_ 在我的基准代码中提到并修复了提升问题。 http://quick-bench.com/Mcclomh03nu8nDCgT3T302xKnXY
我对 try-catch hell 的结果感到惊讶,但感谢这个现在有意义的评论。
更新 (3)
我删除了该try-catch
方法,因为它非常糟糕;并且还随机更改了选定的值,从外观上看,我看到了更现实的基准。virtual
毕竟
这似乎不是正确的答案。http://quick-bench.com/o92Yrt0tmqTdcvufmIpu_fIfHt0
http://quick-bench.com/FFbe3bsIpdFsmgKfm94xGNFKVKs(没有内存泄漏哈哈)
更新 (4)
我消除了生成随机数的开销(我在上次更新中已经这样做了,但似乎我为基准获取了错误的 URL)并添加了一个 EmptyRandom 以了解生成随机数的基线。并且还在 Virtual 中做了一些小的改动,但我认为它不会影响任何东西。 http://quick-bench.com/EmhM-S-xoA0LABYK6yrMyBb8UeI
http://quick-bench.com/5hBZprSRIRGuDaBZ_wj0cOwnNhw(删除了虚拟,以便您可以更好地比较其余部分)
更新 (5)
正如 Jorge Bellon在评论中所说,我没有考虑分配成本;所以我将每个基准转换为使用指针。这种间接性当然会对性能产生影响,但现在更公平了。所以现在循环中没有分配。
这是代码:
删除旧代码;看看更新
到目前为止,我运行了一些基准测试。似乎 g++ 在优化代码方面做得更好:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
EmptyRandom 0.756 ns 0.748 ns 746067433
TradeSpaceForPerformance 2.87 ns 2.86 ns 243756914
Virtual 12.5 ns 12.4 ns 60757698
Index 7.85 ns 7.81 ns 99243512
GetIf 8.20 ns 8.18 ns 92393200
HoldsAlternative 7.08 ns 7.07 ns 96959764
ConstexprVisitor 11.3 ns 11.2 ns 60152725
StructVisitor 10.7 ns 10.6 ns 60254088
Overload 10.3 ns 10.3 ns 58591608
对于铿锵声:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
EmptyRandom 1.99 ns 1.99 ns 310094223
TradeSpaceForPerformance 8.82 ns 8.79 ns 87695977
Virtual 12.9 ns 12.8 ns 51913962
Index 13.9 ns 13.8 ns 52987698
GetIf 15.1 ns 15.0 ns 48578587
HoldsAlternative 13.1 ns 13.1 ns 51711783
ConstexprVisitor 13.8 ns 13.8 ns 49120024
StructVisitor 14.5 ns 14.5 ns 52679532
Overload 17.1 ns 17.1 ns 42553366
现在,对于 clang,最好使用虚拟继承,但对于 g++,最好使用holds_alternative
,或者get_if
但总的来说,std::visit
到目前为止,对于我的几乎所有基准测试来说,这似乎都不是一个好的选择。
我认为如果将模式匹配(能够检查除整数文字之外的更多内容的 switch 语句)添加到 c++ 中将是一个好主意,我们将编写更清洁和更可维护的代码。
我想知道package.index()
结果。不应该更快吗?它有什么作用?
Clang 版本:http: //quick-bench.com/cl0HFmUes2GCSE1w04qt4Rqj6aI
使用One one
而不是auto one = new One
基于Maxim Egorushkin 的评论的版本:http : //quick-bench.com/KAeT00__i2zbmpmUHDutAfiD6-Q(没有太大改变结果)
更新 (6)
我做了一些更改,结果现在从编译器到编译器有很大不同。但这似乎是最好的解决方案std::get_if
。由于未知原因,现在似乎在 clang 中效果最好。这真的让我感到惊讶,因为我记得在 gcc 中做得更好。而且完全没有竞争力;在最后一个基准测试中,它甚至比 vtable 查找还要糟糕。std::holds_alternatives
virtual
virtual
std::visit
这是基准(使用 GCC/Clang 以及 libstdc++ 和 libc++ 运行它):
http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y
#include <benchmark/benchmark.h>
#include <array>
#include <variant>
#include <random>
#include <functional>
#include <algorithm>
using namespace std;
struct One {
auto get () const { return 1; }
};
struct Two {
auto get() const { return 2; }
};
struct Three {
auto get() const { return 3; }
};
struct Four {
auto get() const { return 4; }
};
template<class... Ts> struct overload : Ts... { using Ts::operator()...; };
template<class... Ts> overload(Ts...) -> overload<Ts...>;
std::random_device dev;
std::mt19937 rng(dev());
std::uniform_int_distribution<std::mt19937::result_type> random_pick(0,3); // distribution in range [1, 6]
template <std::size_t N>
std::array<int, N> get_random_array() {
std::array<int, N> item;
for (int i = 0 ; i < N; i++)
item[i] = random_pick(rng);
return item;
}
template <typename T, std::size_t N>
std::array<T, N> get_random_objects(std::function<T(decltype(random_pick(rng)))> func) {
std::array<T, N> a;
std::generate(a.begin(), a.end(), [&] {
return func(random_pick(rng));
});
return a;
}
static void TradeSpaceForPerformance(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
int index = 0;
auto ran_arr = get_random_array<50>();
int r = 0;
auto pick_randomly = [&] () {
index = ran_arr[r++ % ran_arr.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
switch (index) {
case 0:
res = one.get();
break;
case 1:
res = two.get();
break;
case 2:
res = three.get();
break;
case 3:
res = four.get();
break;
}
benchmark::DoNotOptimize(index);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
// Register the function as a benchmark
BENCHMARK(TradeSpaceForPerformance);
static void Virtual(benchmark::State& state) {
struct Base {
virtual int get() const noexcept = 0;
virtual ~Base() {}
};
struct A final: public Base {
int get() const noexcept override { return 1; }
};
struct B final : public Base {
int get() const noexcept override { return 2; }
};
struct C final : public Base {
int get() const noexcept override { return 3; }
};
struct D final : public Base {
int get() const noexcept override { return 4; }
};
Base* package = nullptr;
int r = 0;
auto packages = get_random_objects<Base*, 50>([&] (auto r) -> Base* {
switch(r) {
case 0: return new A;
case 1: return new B;
case 3: return new C;
case 4: return new D;
default: return new C;
}
});
auto pick_randomly = [&] () {
package = packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res = package->get();
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
for (auto &i : packages)
delete i;
}
BENCHMARK(Virtual);
static void FunctionPointerList(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::function<int()>;
std::size_t index;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return std::bind(&One::get, one);
case 1: return std::bind(&Two::get, two);
case 2: return std::bind(&Three::get, three);
case 3: return std::bind(&Four::get, four);
default: return std::bind(&Three::get, three);
}
});
int r = 0;
auto pick_randomly = [&] () {
index = r++ % packages.size();
};
pick_randomly();
for (auto _ : state) {
int res = packages[index]();
benchmark::DoNotOptimize(index);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(FunctionPointerList);
static void Index(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
switch (package->index()) {
case 0:
res = std::get<One>(*package).get();
break;
case 1:
res = std::get<Two>(*package).get();
break;
case 2:
res = std::get<Three>(*package).get();
break;
case 3:
res = std::get<Four>(*package).get();
break;
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(Index);
static void GetIf(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
if (auto item = std::get_if<One>(package)) {
res = item->get();
} else if (auto item = std::get_if<Two>(package)) {
res = item->get();
} else if (auto item = std::get_if<Three>(package)) {
res = item->get();
} else if (auto item = std::get_if<Four>(package)) {
res = item->get();
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(GetIf);
static void HoldsAlternative(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
if (std::holds_alternative<One>(*package)) {
res = std::get<One>(*package).get();
} else if (std::holds_alternative<Two>(*package)) {
res = std::get<Two>(*package).get();
} else if (std::holds_alternative<Three>(*package)) {
res = std::get<Three>(*package).get();
} else if (std::holds_alternative<Four>(*package)) {
res = std::get<Four>(*package).get();
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(HoldsAlternative);
static void ConstexprVisitor(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto func = [] (auto const& ref) {
using type = std::decay_t<decltype(ref)>;
if constexpr (std::is_same<type, One>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Two>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Three>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Four>::value) {
return ref.get();
} else {
return 0;
}
};
for (auto _ : state) {
auto res = std::visit(func, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(ConstexprVisitor);
static void StructVisitor(benchmark::State& state) {
struct VisitPackage
{
auto operator()(One const& r) { return r.get(); }
auto operator()(Two const& r) { return r.get(); }
auto operator()(Three const& r) { return r.get(); }
auto operator()(Four const& r) { return r.get(); }
};
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto vs = VisitPackage();
for (auto _ : state) {
auto res = std::visit(vs, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(StructVisitor);
static void Overload(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto ov = overload {
[] (One const& r) { return r.get(); },
[] (Two const& r) { return r.get(); },
[] (Three const& r) { return r.get(); },
[] (Four const& r) { return r.get(); }
};
for (auto _ : state) {
auto res = std::visit(ov, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(Overload);
// BENCHMARK_MAIN();
GCC 编译器的结果:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance 3.71 ns 3.61 ns 170515835
Virtual 12.20 ns 12.10 ns 55911685
FunctionPointerList 13.00 ns 12.90 ns 50763964
Index 7.40 ns 7.38 ns 136228156
GetIf 4.04 ns 4.02 ns 205214632
HoldsAlternative 3.74 ns 3.73 ns 200278724
ConstexprVisitor 12.50 ns 12.40 ns 56373704
StructVisitor 12.00 ns 12.00 ns 60866510
Overload 13.20 ns 13.20 ns 56128558
clang 编译器的结果(我对此感到惊讶):
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance 8.07 ns 7.99 ns 77530258
Virtual 7.80 ns 7.77 ns 77301370
FunctionPointerList 12.1 ns 12.1 ns 56363372
Index 11.1 ns 11.1 ns 69582297
GetIf 10.4 ns 10.4 ns 80923874
HoldsAlternative 9.98 ns 9.96 ns 71313572
ConstexprVisitor 11.4 ns 11.3 ns 63267967
StructVisitor 10.8 ns 10.7 ns 65477522
Overload 11.4 ns 11.4 ns 64880956
迄今为止的最佳基准(将更新):http: //quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y(也可以查看 GCC)