performance - 测试并发性的基准问题

Question

对于我现在正在做的一个项目，我需要查看不同的支持并发的编程语言的性能（除其他外）。

目前我正在研究比较stackless python和C++ PThreads，所以重点放在这两种语言上，但稍后可能会测试其他语言。当然比较必须尽可能有代表性和准确，所以我的第一个想法是开始寻找一些标准的并发/多线程基准问题，可惜我找不到任何像样的或标准的测试/问题/基准。

所以我的问题如下：您是否有一个好的、简单或快速的问题来测试编程语言的性能（并揭示它在过程中的强项和弱点）？

score 3 · Accepted Answer

当然，您应该测试硬件和编译器，而不是测试并发性能的语言吗？

我会从并发性方面的简单性和生产力的角度来看待一种语言，以及它在多大程度上“隔离”了程序员不会犯锁定错误。

编辑：从过去作为设计并行算法的研究人员的经验来看，我认为您会发现在大多数情况下，并发性能在很大程度上取决于算法的并行化方式以及它如何针对底层硬件。

此外，众所周知，基准是不平等的。在并行环境中更是如此。例如，“处理”非常大的矩阵的基准测试将适合矢量管道处理器，而并行排序可能更适合更通用的多核 CPU。

这些可能有用：

平行基准

NAS 并行基准

score 1 · Accepted Answer

好吧，有一些经典，但不同的测试强调不同的功能。一些分布式系统可能更健壮，具有更有效的消息传递等。更高的消息开销会损害可伸缩性，因为扩展到更多机器的正常方法是发送大量小消息。您可以尝试的一些经典问题是分布式埃拉托色尼筛法或执行不佳的斐波那契数列计算器（即计算系列中的第 8 个数字，第 7 个机器旋转，第 6 个机器旋转）。几乎任何分治算法都可以同时完成。您还可以同时实现 Conway 的生命游戏或热传递。

我想说最容易快速实现的是实现不佳的斐波那契计算器，尽管它过于强调创建线程而过于强调这些线程之间的通信。

score 0 · Accepted Answer

当然，您应该测试硬件和编译器，而不是测试并发性能的语言吗？

不，硬件和编译器与我的测试目的无关。我只是在寻找一些可以测试用一种语言编写的代码与另一种语言的代码竞争的好问题。我真的在测试特定语言中可用的结构来进行并发编程。标准之一是性能（按时间衡量）。

我正在寻找的其他一些测试标准是：

编写正确的代码有多容易；因为众所周知，并发编程比编写单线程程序更难
用于并发编程的技术是什么：事件驱动，基于actor，消息解析，...
程序员自己必须编写多少代码以及自动为他完成多少代码：这也可以用给定的基准问题进行测试
什么是抽象级别以及翻译回机器代码时涉及多少开销

所以实际上，我并不是在寻找性能作为唯一和最好的参数（这确实会把我送到硬件和编译器而不是语言本身），我实际上是从程序员的角度来检查什么语言最适合什么样的问题，它的弱点和优势是什么等等......

请记住，这只是一个小项目，因此测试也应保持较小。（因此对所有东西都进行严格测试是不可行的）

score 0 · Accepted Answer

我决定使用Mandelbrot 集（更精确的转义时间算法）来对不同语言进行基准测试。
它非常适合我，因为可以轻松实现原始算法，并且从中创建多线程变体并不是很多工作。

下面是我目前拥有的代码。它仍然是一个单线程变体，但我会在对结果满意后立即更新它。

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>


int DoThread(const double x, const double y, int maxiter) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++) {
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;
}

void SingleThreaded(int horizPixels, int vertPixels, int maxiter, std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels; x > 0; x--) {
        for(int y = vertPixels; y > 0; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x-1][y-1] = DoThread((3.0 / horizPixels) * (x - (horizPixels / 2)),(3.0 / vertPixels) * (y - (vertPixels / 2)),maxiter);
        }
    }
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    int horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    int vertPixels = atoi(argv[2]);

    //third arg = iterations
    int maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    std::vector<std::vector<int> > result(horizPixels, std::vector<int>(vertPixels,0)); //create and init 2-dimensional vector
    SingleThreaded(horizPixels, vertPixels, maxiter, result);

    //TODO: remove these lines
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }
}

我已经在 Linux 下使用 gcc 对其进行了测试，但我确信它也可以在其他编译器/操作系统下工作。要让它工作，你必须输入一些命令行参数，如下所示：

曼德布罗 106 500 255 1

第一个参数是宽度（x 轴）
第二个参数是高度（y 轴）
第三个参数是最大迭代次数（颜色数）
最后一个是线程数（但那个是目前未使用）

根据我的决定，上面的例子给了我一个很好的 Mandelbrot 集的 ASCII 艺术表示。但是用不同的参数自己尝试（第一个将是最重要的，因为那将是宽度）

score 0 · Accepted Answer

您可以在下面找到我一起编写的代码，以测试 pthreads 的多线程性能。我没有清理它，也没有进行任何优化；所以代码有点原始。

将计算出的 mandelbrot 集保存为位图的代码不是我的，你可以在这里找到

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>

#include "bitmap_Image.h" //for saving the mandelbrot as a bmp

#include <pthread.h>

pthread_mutex_t mutexCounter;
int sharedCounter(0);
int percent(0);

int horizPixels(0);
int vertPixels(0);
int maxiter(0);

//doesn't need to be locked
std::vector<std::vector<int> > result; //create 2 dimensional vector

void *DoThread(void *null) {
    double curX,curY,xSquare,ySquare,x,y;
    int i, intx, inty, counter;
    counter = 0;

    do {
        counter++;
        pthread_mutex_lock (&mutexCounter); //lock
            intx = int((sharedCounter / vertPixels) + 0.5);
            inty = sharedCounter % vertPixels;
            sharedCounter++;
        pthread_mutex_unlock (&mutexCounter); //unlock

        //exit thread when finished
        if (intx >= horizPixels) {
            std::cout << "exited thread - I did " << counter << " calculations" << std::endl;
            pthread_exit((void*) 0);
        }

        //set x and y to the correct value now -> in the range like singlethread
        x = (3.0 / horizPixels) * (intx - (horizPixels / 1.5));
        y = (3.0 / vertPixels) * (inty - (vertPixels / 2));

        curX = x + x*x - y*y;
        curY = y + x*y + x*y;
        ySquare = curY*curY;
        xSquare = curX*curX;

        for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
          ySquare = curY*curY;
          xSquare = curX*curX;
          curY = y + curX*curY + curX*curY;
          curX = x - ySquare + xSquare;
        }
        result[intx][inty] = i;
     } while (true);
}

int DoSingleThread(const double x, const double y) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;

}

void SingleThreaded(std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels - 1; x != -1; x--) {
        for(int y = vertPixels - 1; y != -1; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x][y] = DoSingleThread((3.0 / horizPixels) * (x - (horizPixels / 1.5)),(3.0 / vertPixels) * (y - (vertPixels / 2)));
        }
    }
}

void MultiThreaded(int threadCount, std::vector<std::vector<int> >&  result) {
    /* Initialize and set thread detached attribute */
    pthread_t thread[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);


    for (int i = 0; i < threadCount - 1; i++) {
        pthread_create(&thread[i], &attr, DoThread, NULL);
    }
    std::cout << "all threads created" << std::endl;

    for(int i = 0; i < threadCount - 1; i++) {
        pthread_join(thread[i], NULL);
    }
    std::cout << "all threads joined" << std::endl;
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    vertPixels = atoi(argv[2]);

    //third arg = iterations
    maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    result = std::vector<std::vector<int> >(horizPixels, std::vector<int>(vertPixels,21)); // init 2-dimensional vector
    if (threadCount <= 1) {
        SingleThreaded(result);
    } else {
        MultiThreaded(threadCount, result);
    }


    //TODO: remove these lines
    bitmapImage image(horizPixels, vertPixels);
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            image.setPixelRGB(x,y,16777216*result[x][y]/maxiter % 256, 65536*result[x][y]/maxiter % 256, 256*result[x][y]/maxiter % 256);
            //std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }

    image.saveToBitmapFile("~/Desktop/test.bmp",32);
}

使用具有以下参数的程序可以获得良好的结果：

曼德布罗 5120 3840 256 3

这样您将获得 5 * 1024 宽的图像；5 * 768 高，256 种颜色（唉，你只会得到 1 或 2 个）和 3 个线程（1 个主线程，除了创建工作线程之外不做任何工作，2 个工作线程）

score -1 · Accepted Answer

自从 2008 年 9 月基准测试游戏转移到四核机器上，许多不同编程语言的程序都被重写以利用四核——例如，前 10 个 mandelbrot 程序。

performance - 测试并发性的基准问题

6 回答 6

Related

Reference