在本文中,Optane PM的 8 字节顺序写入clwb
和ntstore
optane PM 的 8 字节顺序写入延迟分别为 90ns 和 62ns,顺序读取为 169ns。
但是在我使用 Intel 5218R CPU 的测试中,clwb
大约是 700ns,ntstore
大约是 1200ns。当然,我的测试方法和论文是有区别的,但是结果太差了,不合理。而且我的测试更接近实际使用情况。
测试的时候是不是CPU的iMC的Write Pending Queue或者optane PM中的WC buffer成为了瓶颈,导致阻塞,测得的延迟不准确?如果是这种情况,是否有工具可以检测到它?
#include "libpmem.h"
#include "stdio.h"
#include "x86intrin.h"
//gcc aep_test.c -o aep_test -O3 -mclwb -lpmem
int main()
{
size_t mapped_len;
char str[32];
int is_pmem;
sprintf(str, "/mnt/pmem/pmmap_file_1");
int64_t *p = pmem_map_file(str, 4096 * 1024 * 128, PMEM_FILE_CREATE, 0666, &mapped_len, &is_pmem);
if (p == NULL)
{
printf("map file fail!");
exit(1);
}
if (!is_pmem)
{
printf("map file fail!");
exit(1);
}
struct timeval start;
struct timeval end;
unsigned long diff;
int loop_num = 10000;
_mm_mfence();
gettimeofday(&start, NULL);
for (int i = 0; i < loop_num; i++)
{
p[i] = 0x2222;
_mm_clwb(p + i);
// _mm_stream_si64(p + i, 0x2222);
_mm_sfence();
}
gettimeofday(&end, NULL);
diff = 1000000 * (end.tv_sec - start.tv_sec) + end.tv_usec - start.tv_usec;
printf("Total time is %ld us\n", diff);
printf("Latency is %ld ns\n", diff * 1000 / loop_num);
return 0;
}
非常感谢任何帮助或更正!