我有一些专有的图像处理代码。它遍历图像并计算图像的一些统计数据。我正在谈论的这种代码的示例可以在下面看到,尽管这不是需要优化的算法。
我的问题是,有什么工具可以用来分析这些紧密的循环,以确定哪里慢?昏昏欲睡,Windows 性能分析器都更专注于识别哪些方法/函数很慢。我已经知道什么功能很慢,我只需要弄清楚如何优化它。
void BGR2YUV(IplImage* bgrImg, IplImage* yuvImg)
{
const int height = bgrImg->height;
const int width = bgrImg->width;
const int step = bgrImg->widthStep;
const int channels = bgrImg->nChannels;
assert(channels == 3);
assert(bgrImg->height == yuvImg->height);
assert(bgrImg->width == yuvImg->width);
// for reasons that are not clear to me, these are not the same.
// Code below has been modified to reflect this fact, but if they
// could be the same, the code below gets sped up a bit.
// assert(bgrImg->widthStep == yuvImg->widthStep);
assert(bgrImg->nChannels == yuvImg->nChannels);
const uchar* bgr = (uchar*) bgrImg->imageData;
uchar* yuv = (uchar*) yuvImg->imageData;
for (int i = 0; i < height; i++)
{
for (int j = 0; j < width; j++)
{
const int ixBGR = i*step+j*channels;
const int b = (int) bgr[ixBGR+0];
const int g = (int) bgr[ixBGR+1];
const int r = (int) bgr[ixBGR+2];
const int y = (int) (0.299 * r + 0.587 * g + 0.114 * b);
const double di = 0.596 * r - 0.274 * g - 0.322 * b;
const double dq = 0.211 * r - 0.523 * g + 0.312 * b;
// Do some shifting and trimming to get i & q to fit into uchars.
const int iv = (int) (128 + max(-128.0, min(127.0, di)));
const int q = (int) (128 + max(-128.0, min(127.0, dq)));
const int ixYUV = i*yuvImg->widthStep + j*channels;
yuv[ixYUV+0] = (uchar)y;
yuv[ixYUV+1] = (uchar)iv;
yuv[ixYUV+2] = (uchar)q;
}
}
}