我想构造一个函数来执行一个文件分析,返回数组中从 0x0 到 0xff 的每个字节数及其频率。
所以,我写了这个原型:
// function prototype and other stuff
unsigned int counts[256] = {0}; // byte lookup table
FILE * pFile; // file handle
long fsize; // to store file size
unsigned char* buff; // buffer
unsigned char* pbuf; // later, mark buffer start
unsigned char* ebuf; // later, mark buffer end
if ( ( pFile = fopen ( FNAME , "rb" ) ) == NULL )
{
printf("Error");
return -1;
}
else
{
//get file size
fseek (pFile , 0 , SEEK_END);
fsize = ftell (pFile);
rewind (pFile);
// allocate space ( file size + 1 )
// I want file contents as string for populating it
// with pointers
buff = (unsigned char*)malloc( sizeof(char) * fsize + 1 );
// read whole file into memory
fread(buff,1,fsize,pFile);
// close file
fclose(pFile);
// mark end of buffer as string
buff[fsize] = '\0';
// set the pointers to beginning and end
pbuf = &buff[0];
ebuf = &buff[fsize];
// Here the Bottleneck
// iterate entire file byte by byte
// counting bytes
while ( pbuf != ebuf)
{
printf("%c\n",*pbuf);
// update byte count
counts[(*pbuf)]++;
++pbuf;
}
// free allocated memory
free(buff);
buff = NULL;
}
// printing stuff
但是这种方式比较慢。我正在寻找相关的算法,因为我已经看到 HxD 例如做得更快。
我认为也许一次读取一些字节可能是一个解决方案,但我不知道如何。
我需要帮助或建议。
谢谢。