4

服务器返回 HTTP 头和二进制文件;像这样的东西:

HTTP/1.1 200 OK
Date: Thu, 28 Jun 2012 22:11:14 GMT
Server: Apache/2.2.3 (Red Hat)
Set-Cookie: JSESSIONID=blabla; Path=/
Pragma: no-cache
Cache-Control: must-revalidate, no-store
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-disposition: inline; filename="foo.pdf"
Content-Length: 6231119
Connection: close
Content-Type: application/pdf

%PDF-1.6
%âãÏÓ
5989 0 obj
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>>
endobj

xref
5989 2744
0000000016 00000 n
0000061228 00000 n
0000061378 00000 n

我只想复制二进制文件。但是如何知道标题部分何时结束?我尝试检查该行是否包含 a\r\n\r\n但看起来该标准不适用于服务器响应,仅适用于客户端。这给出了:

Content-disposition: inline; filename="foo.pdf"
Content-Length: 6231119
Connection: close
Content-Type: application/pdf

%PDF-1.6
%âãÏÓ
5989 0 obj
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>>
endobj

xref
5989 2744
0000000016 00000 n

这是C代码:

while((readed = recv(sock, buffer, 128, 0)) > 0) {

    if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL)
        isnheader = 1;

        if(isnheader) 
          fwrite(buffer, 1, readed, fp);
}

更新:

continue我在 if 语句中添加了一个控件:

if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL) {
    isnheader = 1;
    continue;
}

嗯,它按预期工作。但正如@Alnitak 所提到的,这并不安全。

4

2 回答 2

17

标题和正文应该由\r\n\r\n(RFC 2616 的第 4.1 节)分隔

然而,一些服务器可能会省略\rand only send\n行,特别是如果它们无法清理任何 CGI 提供的标头以确保它们包含\r.

您还需要考虑如何对读取进行分块 - 分隔符很可能跨越您的 128 字节块,这将阻止strstr调用工作。

于 2012-06-28T22:56:17.970 回答
2

您没有正确解析输入。以下是您做错的几件事:

  • 您的代码似乎暗示您的缓冲区将最多包含一行标题数据。但是 recv() 不是对“行”数据进行操作,而是对二进制数据块进行操作。因此,如果您告诉它您的缓冲区长度为 128 字节,它会尝试用 128 字节的数据填充您的缓冲区(即使 128 字节的数据包含多个“行”)。
  • 您的代码没有考虑到标头中断的“\r\n”可能会通过对 recv() 的两次不同调用被拉入您的缓冲区,这将阻止您的代码识别标头中断。
  • 如果您确实找到了标头中断(如果标头的大小恰到好处,则可能会发生这种情况),您最终将推送带有终止“\r\n”和标头中断(“\r\n”)的最后一个标头到您的二进制数据副本中。

我编写了一个快速函数,它应该找到 HTTP 标头的结尾并将服务器的其余响应写入文件流:

void parse_http_headers(int s, FILE * fp)
{
   int       isnheader;
   ssize_t   readed;
   size_t    len;
   size_t    offset;
   size_t    pos;
   char      buffer[1024];
   char    * eol; // end of line
   char    * bol; // beginning of line

   isnheader = 0;
   len       = 0;

   // read next chunk from socket
   while((readed = read(s, &buffer[len], (1023-len))) > 0)
   {
      // write rest of data to FILE stream
      if (isnheader != 0)
         fwrite(buffer, 1, readed, fp);

      // process headers
      if (isnheader == 0)
      {
         // calculate combined length of unprocessed data and new data
         len += readed;

         // NULL terminate buffer for string functions
         buffer[len] = '\0';

         // checks if the header break happened to be the first line of the
         // buffer
         if (!(strncmp(buffer, "\r\n", 2)))
         {
            if (len > 2)
               fwrite(buffer, 1, (len-2), fp);
            continue;
         };
         if (!(strncmp(buffer, "\n", 1)))
         {
            if (len > 1)
               fwrite(buffer, 1, (len-1), fp);
            continue;
         };

         // process each line in buffer looking for header break
         bol = buffer;
         while((eol = index(bol, '\n')) != NULL)
         {
            // update bol based upon the value of eol
            bol = eol + 1; 

            // test if end of headers has been reached
            if ( (!(strncmp(bol, "\r\n", 2))) || (!(strncmp(bol, "\n", 1))) )
            {
               // note that end of headers has been reached
               isnheader = 1;

               // update the value of bol to reflect the beginning of the line
               // immediately after the headers
               if (bol[0] != '\n')
                  bol += 1;
               bol += 1;

               // calculate the amount of data remaining in the buffer
               len = len - (bol - buffer);

               // write remaining data to FILE stream
               if (len > 0)
                  fwrite(bol, 1, len, fp);

               // reset length of left over data to zero and continue processing
               // non-header information
               len = 0;
            };
         };

         if (isnheader == 0)
         { 
            // shift data remaining in buffer to beginning of buffer
            offset = (bol - buffer);
            for(pos = 0; pos < offset; pos++)
               buffer[pos] = buffer[offset + pos];

            // save amount of unprocessed data remaining in buffer
            len = offset;
         };
      };
   };

   return;
}

我没有测试过代码,所以它可能有简单的错误,但是它应该为您指出正确的方向,以便从 C 中的缓冲区解析字符串数据。

于 2012-06-29T05:05:40.117 回答