我正在尝试编写一个与 dlmalloc 相当的内存分配器,它是 glibc 中使用的 malloc。dlmalloc 是具有块拆分功能的最佳分配器,它保留了最近使用的块池,然后再次将块合并为大块。我正在编写的分配器首先适合样式。
我的问题有两个:(1)与 glibc malloc 相比,我的代码的测试时间非常不规则;(2)有时我的代码的平均运行时间会长 3 到 4 倍;(2) 没什么大不了的,但我想了解为什么 glibc malloc 不会受到同样的影响。在这篇文章中进一步展示了(1)中描述的 malloc 和我的代码之间的行为示例。有时,一批 1000 次测试的平均时间会比 malloc 高得多(上面的问题(2)),有时平均值是相同的。但是,对我的代码进行的一批测试的测试时间总是非常不规则(上面的问题(1));这意味着在一批测试中有时间跳跃到平均值的 20 倍,并且这些跳跃穿插在其他规则(接近平均)时间中。glibc malloc 不这样做。
我正在处理的代码如下。
====================================
/* represent an allocated/unallocated block of memory */
struct Block {
/* previous allocated or unallocated block needed for consolidation but not used in allocation */
Block* prev;
/* 1 if allocated and 0 if not */
unsigned int tagh;
/* previous unallocated block */
Block* prev_free;
/* next unallocated block */
Block* next_free;
/* size of current block */
unsigned int size;
};
#define CACHE_SZ 120000000
/* array to be managed by allocator */
char arr[CACHE_SZ] __attribute__((aligned(4)));
/* initialize the contiguous memory located at arr for allocator */
void init_cache(){
/* setup list head node that does not change */
Block* a = (Block*) arr;
a->prev = 0;
a->tagh = 1;
a->prev_free = 0;
a->size = 0;
/* setup the usable data block */
Block* b = (Block*) (arr + sizeof(Block));
b->prev = a;
b->tagh = 0;
b->prev_free = a;
b->size = CACHE_SZ - 3*sizeof(Block);
a->next_free = b;
/* setup list tail node that does not change */
Block* e = (Block*)((char*)arr + CACHE_SZ - sizeof(Block));
e->prev = b;
e->tagh = 1;
e->prev_free = b;
e->next_free = 0;
e->size = 0;
b->next_free = e;
}
char* alloc(unsigned int size){
register Block* current = ((Block*) arr)->next_free;
register Block* new_block;
/* search for a first-fit block */
while(current != 0){
if( current->size >= size + sizeof(Block)) goto good;
current = current->next_free;
}
/* what to do if no decent size block found */
if( current == 0) {
return 0;
}
/* good block found */
good:
/* if block size is exact return it */
if( current->size == size){
if(current->next_free != 0) current->next_free->prev_free = current->prev_free;
if(current->prev_free != 0) current->prev_free->next_free = current->next_free;
return (char* ) current + sizeof(Block);
}
/* otherwise split the block */
current->size -= size + sizeof(Block);
new_block = (Block*)( (char*)current + sizeof(Block) + current->size);
new_block->size = size;
new_block->prev = current;
new_block->tagh = 1;
((Block*)((char*) new_block + sizeof(Block) + new_block->size ))->prev = new_block;
return (char* ) new_block + sizeof(Block);
}
main(int argc, char** argv){
init_cache();
int count = 0;
/* the count considers the size of the cache arr */
while(count < 4883){
/* the following line tests malloc; the quantity(1024*24) ensures word alignment */
//char * volatile p = (char *) malloc(1024*24);
/* the following line tests above code in exactly the same way */
char * volatile p = alloc(1024*24);
count++;
}
}
======================================
我简单地编译了上面的代码:
g++ -O9 alloc.c
并运行一个简单的测试,该测试总是会拆分块并且永远不会返回确切大小的块:
bash$ for((i=0; i<1000; i++)); do (time ./a.out) 2>&1|grep real; 完毕
我的代码和 glibc malloc 的测试示例输出如下:
我的代码:
real 0m0.023s
real 0m0.109s <----- irregular jump >
real 0m0.024s
real 0m0.086s
real 0m0.022s
real 0m0.104s <----- again irregular jump >
real 0m0.023s
real 0m0.023s
real 0m0.098s
real 0m0.023s
real 0m0.097s
real 0m0.024s
real 0m0.091s
real 0m0.023s
real 0m0.025s
real 0m0.088s
real 0m0.023s
real 0m0.086s
real 0m0.024s
real 0m0.024s
malloc 代码(良好且经常保持接近 20 毫秒):
real 0m0.025s
real 0m0.024s
real 0m0.024s
real 0m0.026s
real 0m0.024s
real 0m0.026s
real 0m0.025s
real 0m0.026s
real 0m0.026s
real 0m0.025s
real 0m0.025s
real 0m0.024s
real 0m0.024s
real 0m0.024s
real 0m0.025s
real 0m0.026s
real 0m0.025s
请注意,malloc 代码时间更规律。在其他不可预测的时间,我的代码有 0m0.070s 而不是 0m0.020s,因此平均运行时间接近 70ms 而不是 25ms(上面的问题 (2)),但这里没有显示。在这种情况下,我很幸运能够让它运行接近 malloc 的平均值(25ms)
问题是,(1)如何修改我的代码以拥有更多的常规时间,例如 glibc malloc ?和(2)如果可能的话,我怎样才能使它比 glibc malloc 更快,因为我已经读到 dlmalloc 是一个典型的平衡分配器,并不是最快的(只考虑拆分/最佳匹配/首次匹配分配器而不是其他分配器) ?