1

想象一下,我想针对三个不同的数据集测试序列化/反序列化例程。这导致 2x3=6 基准。

理想情况下,我想实现以下目标:

  • 避免代码重复
  • 每次可执行调用仅调用数据集生成器函数一次,并且仅在未被排除时--benchmark_filter=...(生成器函数很昂贵)
  • 有意义的基准名称(例如“Serialize/DatasetAlpha”)

指南中提到的任何功能似乎都不完全符合目的。到目前为止,我发现的最接近的解决方案是使用 vararg-parameterized Serialize()/Deserialize()函数以及生成器函数,它们会将生成的数据作为单例返回。

有没有更好的办法?

这是我想避免的:

#include <benchmark/benchmark.h>

/* library */
std::string serialize(const std::string& data) {
  return data;
}
std::string deserialize(const std::string& data) {
  return data;
}

/* helpers */
void SerializeHelper(benchmark::State& state, const std::string& data) {
  for (auto _ : state) {
    std::string bytes = serialize(data);
    benchmark::DoNotOptimize(bytes);
  }
}

void DeserializeHelper(benchmark::State& state, const std::string& data) {
  std::string bytes = serialize(data);
  for (auto _ : state) {
    std::string data_out = deserialize(data);
    benchmark::DoNotOptimize(data_out);
  }
}

std::string GenerateDatasetAlpha() {
  return "";
}
std::string GenerateDatasetBeta() {
  return "";
}
std::string GenerateDatasetGamma() {
  return "";
}


/* oh, my... */
void SerializeAlpha(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetAlpha());
}
void DeserializeAlpha(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetAlpha());
}
void SerializeBeta(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetBeta());
}
void DeserializeBeta(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetBeta());
}
void SerializeGamma(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetGamma());
}
void DeserializeGamma(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetGamma());
}

BENCHMARK(SerializeAlpha);
BENCHMARK(DeserializeAlpha);
BENCHMARK(SerializeBeta);
BENCHMARK(DeserializeBeta);
BENCHMARK(SerializeGamma);
BENCHMARK(DeserializeGamma);

BENCHMARK_MAIN();

//g++ wtf.cc -o wtf -I benchmark/include/ -lbenchmark -L benchmark/build/src -lpthread -O3
4

1 回答 1

1

到目前为止,我发现的最接近的解决方案是将模板基准与每个数据集生成器类一起使用:

#include <benchmark/benchmark.h>

/* library */
std::string serialize(const std::string& data) {
  return data;
}
std::string deserialize(const std::string& data) {
  return data;
}

/* benchmarks routines */
template<typename Dataset>
void SerializeBenchmark(benchmark::State& state) {
  std::string data = Dataset()();
  for (auto _ : state) {
    std::string bytes = serialize(data);
    benchmark::DoNotOptimize(bytes);
  }
}

template<typename Dataset>
void DeserializeBenchmark(benchmark::State& state) {
  std::string data = Dataset()();
  std::string bytes = serialize(data);
  for (auto _ : state) {
    std::string data_out = deserialize(data);
    benchmark::DoNotOptimize(data_out);
  }
}

/* datasets generators and benchmark registration */

struct Dataset1 {
  std::string operator()() {
    return ""; // load from file, generate random data, etc
  }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset1);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset1);

struct Dataset2 {
  std::string operator()() { return ""; }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset2);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset2);

struct Dataset3 {
  std::string operator()() { return ""; }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset3);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset3);

BENCHMARK_MAIN();

这使代码膨胀量保持在相当低的水平。基准名称也不错,例如SerializeBenchmark<Dataset2>. 数据集生成函数仍然被多次调用,所以如果你想避免这种情况,你必须将它们存储在具有延迟加载的单例中。

于 2019-08-12T08:10:13.717 回答