c++ - 高效的索引绑定检查和双重到整数转换

Question

考虑以下代码片段

double *x, *id;
int i, n; // = vector size

// allocate and zero x
// set id to 0:n-1

for(i=0; i<n; i++) {  
  long iid = (long)id[i];
  if(iid>=0 && iid<n && (double)iid==id[i]){
    x[iid] = 1;
  } else break;
}

该代码使用向量id类型中的值作为向量的double索引x。为了使索引有效，我验证它们大于或等于 0，小于向量大小 n，并且存储的双精度id数实际上是整数。在此示例中，存储了从 1 到 n 的整数，因此所有向量都是线性访问的，并且语句id的分支预测应该始终有效。if

因为n=1e8代码在我的电脑上需要 0.21 秒。由于在我看来它是一个计算量轻的循环，我希望它是内存带宽有限的。根据基准内存带宽，我预计它可以在 0.15 秒内运行。我将内存占用计算为每个值 8 个字节id，每个值 16 个字节x（它需要同时写入和从内存中读取，因为我假设不使用 SSE 流）。所以每个向量条目总共有 24 个字节。

问题：

我错说这段代码应该是内存带宽限制的，并且可以改进吗？
如果没有，您知道我可以提高性能以使其与内存速度一起工作的方法吗？
或者也许一切都很好，除了并行运行它之外，我无法轻易改进它？

改变的类型id不是一种选择——它必须是double。此外，在一般情况下id，x具有不同的大小并且必须作为单独的数组保存 - 它们来自程序的不同部分。简而言之，我想知道是否可以以更有效的方式编写绑定检查和类型转换/整数验证。

为方便起见，整个代码：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

static struct timeval tb, te;

void tic()
{
  gettimeofday(&tb, NULL);
}

void toc(const char *idtxt)
{
  long s,u;
  gettimeofday(&te, NULL);
  s=te.tv_sec-tb.tv_sec;
  u=te.tv_usec-tb.tv_usec;
  printf("%-30s%10li.%.6li\n", idtxt, 
     (s*1000000+u)/1000000, (s*1000000+u)%1000000);
}

int main(int argc, char *argv[])
{
  double *x  = NULL;
  double *id = NULL;
  int i, n;

  // vector size is a command line parameter
  n = atoi(argv[1]);
  printf("x size %i\n", n);

  // not included in timing in MATLAB
  x = calloc(sizeof(double),n);
  memset(x, 0, sizeof(double)*n);

  // create index vector
  tic();
  id  = malloc(sizeof(double)*n);
  for(i=0; i<n; i++) id[i] = i;
  toc("id = 1:n");

  // use id to index x and set all entries to 4
  tic();
  for(i=0; i<n; i++) {  
    long iid = (long)id[i];
    if(iid>=0 && iid<n && (double)iid==id[i]){
      x[iid] = 1;
    } else break;
  }
  toc("x(id) = 1");
}

score 1 · Accepted Answer

编辑：忽略如果你不能拆分数组！

我认为可以通过利用通用缓存概念来改进它。您可以在时间或位置上关闭数据访问。使用紧密的 for 循环，您可以通过塑造您的数据结构（如 for 循环）来获得更好的数据命中率。在这种情况下，您访问两个不同的数组，通常每个数组中的索引相同。您的机器通过该循环每次迭代都加载两个数组的块。为了增加每个负载的使用，创建一个结构来保存每个数组的一个元素，并使用该结构创建一个数组：

struct my_arrays
{
    double x;
    int id;
};

struct my_arrays* arr = malloc(sizeof(my_arrays)*n);

现在，每次将数据加载到缓存中时，都会命中加载的所有内容，因为数组靠得很近。

编辑：由于您的意图是检查整数值，并且您明确假设这些值足够小，可以精确地以双精度表示而不会损失精度，那么我认为您的比较很好。

我之前的回答提到要小心在隐式转换后比较大双精度数，我引用了这个：浮点和双精度比较最有效的方法是什么？

score 0 · Accepted Answer

可能值得考虑检查double类型表示。

例如，以下代码显示了如何将double大于 1 的数字与 999 进行比较：

bool check(double x)
{
    union
    {
        double d;
        uint32_t y[2];
    };
    d = x;
    bool answer;
    uint32_t exp = (y[1] >> 20) & 0x3ff;
    uint32_t fraction1 = y[1] << (13 + exp); // upper bits of fractiona part
    uint32_t fraction2 = y[0]; // lower 32 bits of fractional part
    if (fraction2 != 0 || fraction1 != 0)
        answer = false;
    else if (exp > 8)
        answer = false;
    else if (exp == 8)
        answer = (y[1] < 0x408f3800); // this is the representation of 999
    else
        answer = true;
    return answer;
}

这看起来像很多代码，但它可能很容易矢量化（使用例如 SSE），如果你的界限是 2 的幂，它可能会进一步简化代码。

c++ - 高效的索引绑定检查和双重到整数转换

2 回答 2

Related

Reference