我正在编写 http 解析器并具有这些功能
int parse_useragent(char* buf, int length){
buf[length] = '\0';
if(strstr(buf, "MSIE") != NULL){
return 1;
}else if(strstr(buf, "Firefox") != NULL){
return 2;
}
return DEFAULT_USERAGENT;
}
void parse_headers(unsigned char* buf, http_record_t * http){
char * position = (char*)buf;
char referer[] = "Referer";
char useragent[] = "User-Agent";
...
int length = getlinelength(position); // returns length of line
while(length != 1){ // position points to start of line every iteration of cycle
if(strncmp(position, useragent, sizeof(useragent)-1) == 0){
http->useragent = parse_useragent(position, length);
fprintf(stderr,"parsing useragent \n");
}else if(strncmp(position, referer, sizeof(referer)-1) == 0){
fprintf(stderr,"parsing referer \n");
char * tmp = malloc(REFERER_LENGHT * sizeof(char));
parse_referer(tmp,position, length);
strncpy(http->referer,tmp, REFERER_LENGHT * sizeof(char) - 1);
}else if(...
position += length + 1;
length = getlinelength(position);
}
return;
}
buf
指向http头的开始。
我parse_useragent
对每个标题都有类似的功能,我真的需要优化它们。数据包长度通常小于 1000,行长很少超过 100 值。对这么短的字符串进行优化会有什么明显的效果吗?
我知道其中一些算法需要不同的解析方法,然后逐行解析。在这些特定条件下选择哪种方式?
- http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
- http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_string_search_algorithm
- http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
- http://en.wikipedia.org/wiki/Suffix_tree
- http://en.wikipedia.org/wiki/Suffix_array
- http://www.codeproject.com/Articles/250566/Fastest-strstr-like-function-in-C
- http://www.sanmayce.com/Railgun/index.html
感谢帮助!