2

我在 java 中编写了以下测试类来重现“错误共享”引入的性能损失。

基本上,您可以将数组的“大小”从 4 调整为更大的值(例如 10000),以打开或关闭“错误共享现象”。具体来说,当 size = 4 时,不同的线程更有可能更新同一缓存行中的值,从而导致更频繁的缓存未命中。理论上,当 size = 10000 时,测试程序应该比 size = 4 运行得快得多。

我在两台不同的机器上多次运行相同的测试:

机器 A: Lenovo X230 笔记本电脑 w/ Intel® Core™ i5-3210M 处理器(2 核,4 线程)Windows 7 64 位

大小 = 4 => 5.5 秒

大小 = 10000 => 5.4 秒

机器 B: Dell OptiPlex 780 w/ Intel® Core™2 Duo 处理器 E8400(2 核)Windows XP 32 位

大小 = 4 => 14.5 秒

大小 = 10000 => 7.2 秒

后来我在其他几台机器上进行了测试,很明显,错误共享只在某些机器上变得明显,我无法弄清楚造成这种差异的决定性因素。

任何人都可以看看这个问题并解释为什么这个测试类中引入的错误共享只在某些机器上变得明显吗?

public class FalseSharing {

interface Oper {
    int eval(int value);
}

//try tweak the size
static int size = 4;

//try tweak the op
static Oper op = new Oper() {
    @Override
    public int eval(int value) {
        return value + 2;
    }
};

static int[] array = new int[10000 + size];

static final int interval = (size / 4);

public static void main(String args[]) throws InterruptedException {

    long start = System.currentTimeMillis();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + 5000);

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000] = op.eval(array[5000]);
                }
            }
        }
    });
    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval] = op.eval(array[5000 + interval]);
                }
            }
        }
    });
    Thread t3 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 2));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 2] = op.eval(array[5000 + interval * 2]);
                }
            }
        }
    });
    Thread t4 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 3));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 3] = op.eval(array[5000 + interval * 3]);
                }
            }
        }
    });
    t1.start();
    t2.start();
    t3.start();
    t4.start();
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    System.out.println("Finished!" + (System.currentTimeMillis() - start));
}

}

4

2 回答 2

0

虚假共享仅发生在 64 字节的块中。您需要在所有四个线程中访问相同的 64 字节块。我建议您创建一个对象或数组,long[8]并在所有四个线程中更新该数组的不同单元格,并与访问独立数组的四个线程进行比较。

于 2013-09-13T13:55:20.990 回答
0

您的代码可能没问题,这是一个更简单的版本,结果如​​下:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;


public class TestFalseSharing {
    static long T0 = System.currentTimeMillis();

    static void p(Object msg) {
        System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);
    }

    public static void main(String args[]) throws InterruptedException {
        int NT = Runtime.getRuntime().availableProcessors();
        p("Available processors: "+NT);

        int MAXSPAN = 0x1000; //4kB
        final byte[] array = new byte[NT*MAXSPAN];

        for(int i=1; i<=MAXSPAN; i<<=1) {
            testFalseSharing(NT, i, array);
        }
    }

    static void testFalseSharing(final int NT, final int span, final byte[] array) throws InterruptedException {
        final int L1 = 10;
        final int L2 = 10_000_000;

        final CountDownLatch cl = new CountDownLatch(NT*L1);

        long t0 = System.nanoTime();

        for(int i=0 ; i<4; i++) {
            final int startOffset = i*span;

            Thread t = new Thread(new Runnable() {
                @Override
                public void run() {
                    //p("Offset:" + startOffset);
                    for (int j = 0; j < L1; j++) {
                        for (int k = 0; k < L2; k++) {
                            array[startOffset] += 1;
                        }
                        cl.countDown();
                    }
                }
            });
            t.start();

        }

        while(!cl.await(10, TimeUnit.SECONDS)) {
            p(""+cl.getCount()+" left");
        }

        long d = System.nanoTime() - t0;
        p("Duration: " + 1e-9*d + " seconds, Span="+span+" bytes");
    }
}

结果:

00000.000 main        : Available processors: 4
00002.843 main        : Duration: 2.837645384 seconds, Span=1 bytes
00005.689 main        : Duration: 2.8454065760000002 seconds, Span=2 bytes
00008.659 main        : Duration: 2.9697156340000004 seconds, Span=4 bytes
00011.640 main        : Duration: 2.979306959 seconds, Span=8 bytes
00013.780 main        : Duration: 2.140246744 seconds, Span=16 bytes
00015.387 main        : Duration: 1.6061148440000002 seconds, Span=32 bytes
00016.729 main        : Duration: 1.34128957 seconds, Span=64 bytes
00017.944 main        : Duration: 1.215005455 seconds, Span=128 bytes
00019.208 main        : Duration: 1.263007368 seconds, Span=256 bytes
00020.477 main        : Duration: 1.269272208 seconds, Span=512 bytes
00021.719 main        : Duration: 1.241061631 seconds, Span=1024 bytes
00022.975 main        : Duration: 1.256024242 seconds, Span=2048 bytes
00024.171 main        : Duration: 1.195086858 seconds, Span=4096 bytes

所以要回答,它证实了 64 字节缓存线理论,至少在我的笔记本电脑核心 i5 上。

于 2016-02-08T17:03:44.963 回答