我在 java 中编写了以下测试类来重现“错误共享”引入的性能损失。

基本上,您可以将数组的“大小”从 4 调整为更大的值(例如 10000),以打开或关闭“错误共享现象”。具体来说,当 size = 4 时,不同的线程更有可能更新同一缓存行中的值,从而导致更频繁的缓存未命中。理论上,当 size = 10000 时,测试程序应该比 size = 4 运行得快得多。


机器 A: Lenovo X230 笔记本电脑 w/ Intel® Core™ i5-3210M 处理器(2 核,4 线程)Windows 7 64 位

大小 = 4 => 5.5 秒

大小 = 10000 => 5.4 秒

机器 B: Dell OptiPlex 780 w/ Intel® Core™2 Duo 处理器 E8400(2 核)Windows XP 32 位

大小 = 4 => 14.5 秒

大小 = 10000 => 7.2 秒



public class FalseSharing {

interface Oper {
    int eval(int value);

//try tweak the size
static int size = 4;

//try tweak the op
static Oper op = new Oper() {
    public int eval(int value) {
        return value + 2;

static int[] array = new int[10000 + size];

static final int interval = (size / 4);

public static void main(String args[]) throws InterruptedException {

    long start = System.currentTimeMillis();
    Thread t1 = new Thread(new Runnable() {
        public void run() {

            System.out.println("Array index:" + 5000);

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000] = op.eval(array[5000]);
    Thread t2 = new Thread(new Runnable() {
        public void run() {

            System.out.println("Array index:" + (5000 + interval));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval] = op.eval(array[5000 + interval]);
    Thread t3 = new Thread(new Runnable() {
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 2));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 2] = op.eval(array[5000 + interval * 2]);
    Thread t4 = new Thread(new Runnable() {
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 3));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 3] = op.eval(array[5000 + interval * 3]);
    System.out.println("Finished!" + (System.currentTimeMillis() - start));



2 回答 2


虚假共享仅发生在 64 字节的块中。您需要在所有四个线程中访问相同的 64 字节块。我建议您创建一个对象或数组,long[8]并在所有四个线程中更新该数组的不同单元格,并与访问独立数组的四个线程进行比较。

于 2013-09-13T13:55:20.990 回答


import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;

public class TestFalseSharing {
    static long T0 = System.currentTimeMillis();

    static void p(Object msg) {
        System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);

    public static void main(String args[]) throws InterruptedException {
        int NT = Runtime.getRuntime().availableProcessors();
        p("Available processors: "+NT);

        int MAXSPAN = 0x1000; //4kB
        final byte[] array = new byte[NT*MAXSPAN];

        for(int i=1; i<=MAXSPAN; i<<=1) {
            testFalseSharing(NT, i, array);

    static void testFalseSharing(final int NT, final int span, final byte[] array) throws InterruptedException {
        final int L1 = 10;
        final int L2 = 10_000_000;

        final CountDownLatch cl = new CountDownLatch(NT*L1);

        long t0 = System.nanoTime();

        for(int i=0 ; i<4; i++) {
            final int startOffset = i*span;

            Thread t = new Thread(new Runnable() {
                public void run() {
                    //p("Offset:" + startOffset);
                    for (int j = 0; j < L1; j++) {
                        for (int k = 0; k < L2; k++) {
                            array[startOffset] += 1;


        while(!cl.await(10, TimeUnit.SECONDS)) {
            p(""+cl.getCount()+" left");

        long d = System.nanoTime() - t0;
        p("Duration: " + 1e-9*d + " seconds, Span="+span+" bytes");


00000.000 main        : Available processors: 4
00002.843 main        : Duration: 2.837645384 seconds, Span=1 bytes
00005.689 main        : Duration: 2.8454065760000002 seconds, Span=2 bytes
00008.659 main        : Duration: 2.9697156340000004 seconds, Span=4 bytes
00011.640 main        : Duration: 2.979306959 seconds, Span=8 bytes
00013.780 main        : Duration: 2.140246744 seconds, Span=16 bytes
00015.387 main        : Duration: 1.6061148440000002 seconds, Span=32 bytes
00016.729 main        : Duration: 1.34128957 seconds, Span=64 bytes
00017.944 main        : Duration: 1.215005455 seconds, Span=128 bytes
00019.208 main        : Duration: 1.263007368 seconds, Span=256 bytes
00020.477 main        : Duration: 1.269272208 seconds, Span=512 bytes
00021.719 main        : Duration: 1.241061631 seconds, Span=1024 bytes
00022.975 main        : Duration: 1.256024242 seconds, Span=2048 bytes
00024.171 main        : Duration: 1.195086858 seconds, Span=4096 bytes

所以要回答,它证实了 64 字节缓存线理论,至少在我的笔记本电脑核心 i5 上。

于 2016-02-08T17:03:44.963 回答