我的 CPU 规格说它应该获得 5.336GB/s 的内存带宽。为了测试这一点,我编写了一个简单的程序,它在一个大数组上运行 memset(或 memcpy)并报告时间。我在 memset 上显示 3.8GB/s,在 memcpy 上显示 1.9GB/s。 http://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)说我的 Q9400 应该达到 5.336MB/s。怎么了?
我尝试用赋值循环替换 memset 或 memcpy。我已经四处搜索以尝试了解内存对齐。我尝试了不同的编译器标志。我为此花费了令人尴尬的几个小时。感谢您的任何帮助,您可以提供!
我正在使用带有 libc-dev 版本 2.15-0ubuntu10.5 和内核 3.8.0-37-generic 的 Ubuntu 12.04
编码:
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>
#define numBytes ((long)(1024*1024*1024))
#define numTransfers ((long)(8))
int main(int argc,char**argv){
if(argc!=3){
printf("Usage: %s BLOCK_SIZE_IN_BYTES NUMBER_OF_BLOCKS_TO_TRANSFER\n",argv[0]);
return -1;
}
char*__restrict__ source=(char*)malloc(numBytes);
char*__restrict__ dest=(char*)malloc(numBytes);
struct timespec start,end;
long totalTimeMs;
int i;
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
for(i=0;i<numTransfers;++i)
memset(source,0,numBytes);
clock_gettime(CLOCK_MONOTONIC_RAW,&end);
totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
printf("memset %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s). ",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
for(i=0;i<numTransfers;++i)
memcpy( dest, source, numBytes);
clock_gettime(CLOCK_MONOTONIC_RAW,&end);
totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
printf("memcpy %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s).\n",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);
free(source);
free(dest);
return EXIT_SUCCESS;
}
编译命令:
gcc -O3 -DNDEBUG -o memcpyStackOverflowNoParameters.c.o -c memcpyStackOverflowNoParameters.c
gcc -O3 -DNDEBUG memcpyStackOverflowNoParameters.c.o -o memcpy -rdynamic -lrt
示例输出:
memset 1073741824 bytes 8 times (8.00GB total) in 2214ms (3.880GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4466ms (1.923GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2218ms (3.873GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4557ms (1.885GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2222ms (3.866GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4433ms (1.938GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2216ms (3.876GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4521ms (1.900GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2217ms (3.875GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4520ms (1.900GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2218ms (3.873GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4430ms (1.939GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2226ms (3.859GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4444ms (1.933GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2225ms (3.861GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4485ms (1.915GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2620ms (3.279GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4855ms (1.769GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2535ms (3.389GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4870ms (1.764GB/s).
memset 1073741824 bytes 8 times (8.00GB total) in 2423ms (3.545GB/s). memcpy 1073741824 bytes 8 times (8.00GB total) in 4905ms (1.751GB/s).
根据 lshw 我的硬件:
product: OptiPlex 960 ()
vendor: Winbond Electronics
width: 64 bits
*-core
description: Motherboard
product: 0Y958C
vendor: Winbond Electronics
*-firmware
description: BIOS
capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect edd int13floppytoshiba int13floppy720 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot
*-cpu
product: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
physical id: 400
size: 2666MHz
width: 64 bits
clock: 1333MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
configuration: cores=4 enabledcores=4 threads=4
*-cache:0
description: L1 cache
physical id: 700
size: 256KiB
capacity: 256KiB
capabilities: internal write-back unified
*-cache:1
description: L2 cache
physical id: 701
size: 6MiB
capacity: 6MiB
capabilities: internal varies unified
*-memory
description: System Memory
physical id: 1000
slot: System board or motherboard
size: 4GiB
*-bank:0
description: DIMM DDR2 Synchronous 667 MHz (1.5 ns)
product: CT51264AA667.M16FC
vendor: 7F7F7F7F7F9B0000
slot: DIMM_1
size: 4GiB
width: 64 bits
clock: 667MHz (1.5ns)
*-bank:1
description: DIMM DDR2 Synchronous 667 MHz (1.5 ns) [empty]
*-bank:2
description: DIMM DDR2 Synchronous 667 MHz (1.5 ns) [empty]
*-bank:3
description: DIMM DDR2 Synchronous 667 MHz (1.5 ns) [empty]