c++ - Optimally packing a recursively templated struct without loss of alignment

Question

I have a struct of 4 fields of types that come from template parameters:

template <typename T1, typename T2, typename T3, typename T4>
struct __attribute__((aligned(8))) four_tuple {
  typedef struct {
    T1 t1;
    T2 t2;
    T3 t3;
    T4 t4;
  } payload;
  payload p;
};

Each type T1, T2, T3, and T4, is guaranteed to be a primitive type or a four_tuple<...>::payload type. The guarantees are recursive - you can think of the struct as encoding a quadtree whose leaf nodes are primitive types.

My goal is for the struct to have minimum possible sizeof, subject to the condition that all of the leaf nodes are properly aligned. The tools allowed for the optimization are class template specializations using:

reordering of fields t1, t2, t3, t4
addition of filler fields
gcc attribute packed on payload
maybe others?

I feel like there is a clever solution to this problem using enable_if and SFINAE. Can anyone find it?

To illustrate the problem, if we use the above implementation as-is using Foo = four_tuple<char,double,char,double>, we'll have a size of 32 for the payload and overall. If we simply declare the payload packed, the double's will not be well-aligned. A template specialization that reorders the fields in decreasing order (here, double, double, char, char) will give a payload and overall size of 24. But the extra 6 bytes it uses are wasteful, as can be seen by considering using Bar = four_tuple<Foo::payload,int,int,int>. With optimal packing Bar could fit in 32 bytes, but with this scheme it would require 40. Bluntly applying field-reordering with packed will result in misaligned int's in Bar - some filler is needed.

I know that in general restructuring the memory layout of a struct's fields can have performance implications due to cache considerations, and that in general those implications will be at least as significant as any potential gains from better packing. I'd like to explore the tradeoffs, though, and I can't really do that properly in my context without solving this problem.

score 1 · Accepted Answer

嵌套元组案例中的大问题是您希望有一个 type 的字段four_tuple<char,double,char,double>::payload，就像它是 a 一样对齐four_tuple<char,double,char,double>，但不需要容器类型继承其对齐方式。这是复杂的。这样做是可能的，但它会使您的代码非常难以移植到 GCC 以外的任何东西上。我想这没关系，因为您已经在问题中建议了 GCC 扩展。基本思想是位域可用于插入填充以确保对齐：

struct __attribute__((packed)) S {
  char c; // at offset 0
  int i; // at offset 1, not aligned
  int : 0;
  int j; // at offset 8, aligned
  int : 0;
  int k; // at offset 12, no extra padding between j and k
};

int当然是一种非常具体的类型，具有非常具体的对齐方式，您需要动态确定的对齐方式。幸运的是，GCC 允许类型为的位域（char通常只强制字节对齐）与结合alignas，确保任意对齐。

完成后，您可以检查所有 24 个可能的字段排序并选择总大小最小的有效负载。我将有效负载设置为全局类型，并为其提供了一个额外的模板参数来指示字段顺序。这允许按顺序tuple4<T1, T2, T3, T4>检查tuple4_payload<T1, T2, T3, T4, 1234>,tuple4_payload<T1, T2, T3, T4, 1243>等并选择最好的。

template <typename...> struct smallest;
template <typename...T> using smallest_t = typename smallest<T...>::type;

template <typename T> struct smallest<T> { using type = T; };
template <typename T, typename...Ts> struct smallest<T, Ts...> { using type = std::conditional_t<sizeof(T) <= sizeof(smallest_t<Ts...>), T, smallest_t<Ts...>>; };

template <typename T1, typename T2, typename T3, typename T4> struct tuple4;
template <typename T1, typename T2, typename T3, typename T4, int fieldOrder> struct tuple4_payload;
template <typename T1, typename T2, typename T3, typename T4> struct tuple4_simple { T1 t1; T2 t2; T3 t3; T4 t4; };

template <typename T> struct extract_payload { using type = T; };
template <typename...T> struct extract_payload<tuple4<T...>> { using type = typename tuple4<T...>::payload; };
template <typename T> using extract_payload_t = typename extract_payload<T>::type;

#define PERMS \
  PERM(1,2,3,4) PERM(1,2,4,3) PERM(1,3,2,4) PERM(1,3,4,2) PERM(1,4,2,3) PERM(1,4,3,2) \
  PERM(2,1,3,4) PERM(2,1,4,3) PERM(2,3,1,4) PERM(2,3,4,1) PERM(2,4,1,3) PERM(2,4,3,1) \
  PERM(3,1,2,4) PERM(3,1,4,2) PERM(3,2,1,4) PERM(3,2,4,1) PERM(3,4,1,2) PERM(3,4,2,1) \
  PERM(4,1,2,3) PERM(4,1,3,2) PERM(4,2,1,3) PERM(4,2,3,1) PERM(4,3,1,2) PERM(4,3,2,1)

#define PERM(a,b,c,d) \
  template <typename T1, typename T2, typename T3, typename T4> \
  struct __attribute__((packed)) tuple4_payload<T1, T2, T3, T4, a##b##c##d> { \
    char : 0 alignas(T##a); extract_payload_t<T##a> t##a; \
    char : 0 alignas(T##b); extract_payload_t<T##b> t##b; \
    char : 0 alignas(T##c); extract_payload_t<T##c> t##c; \
    char : 0 alignas(T##d); extract_payload_t<T##d> t##d; \
  };
PERMS
#undef PERM

#define PERM(a,b,c,d) , tuple4_payload<T1, T2, T3, T4, a##b##c##d>
template <typename, typename...T> using tuple4_smallest_payload_t = smallest_t<T...>;
template <typename T1, typename T2, typename T3, typename T4>
struct alignas(tuple4_simple<T1, T2, T3, T4>) tuple4 : tuple4_smallest_payload_t<void PERMS> {
  using payload = tuple4_smallest_payload_t<void PERMS>;
};
#undef PERM

在您的情况下，您可以将其用作tuple4<int, tuple4<char, double, char, double>, int, int>. 请注意，即使此处未明确提及有效负载类型，它仍将用于t2成员。

c++ - Optimally packing a recursively templated struct without loss of alignment

1 回答 1

Related

Reference