c - http服务器响应（套接字）的标头和内容之间的差异

Question

我想知道，是否有可能找出响应流中标头结束的位置？

问题的背景如下，我在c中使用套接字从网站获取内容，内容以gzip编码。我想直接从流中读取内容并使用 zlib 对 gzip 内容进行编码。但是我怎么知道 gzip 内容开始并且 http 标头完成了。

我粗略地尝试了两种方法，在我看来，它们给了我一些奇怪的结果。首先，我读入整个流，并在终端中打印出来，我的 http 标头以“\r\n\r\n”结尾，就像我预期的那样，但是第二次，我只检索一次响应以获取标头然后使用 while 循环读取内容，此处标题以没有“\r\n\r\n”结尾。

为什么？哪种方式是阅读内容的正确方式？

我只会给你代码，这样你就可以看到我是如何从服务器获得响应的。

//first way (gives rnrn)
char *output, *output_header, *output_content, **output_result;
size_t size;
FILE *stream;
stream = open_memstream (&output, &size);
char BUF[BUFSIZ];
while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    fprintf (stream, "%s", BUF);
}
fflush(stream);
fclose(stream);

output_result = str_split(output, "\r\n\r\n");
output_header = output_result[0];
output_content = output_result[1];

printf("Header:\n%s\n", output_header);
printf("Content:\n%s\n", output_content);

.

//second way (doesnt give rnrn)
char *content, *output_header;
size_t size;
FILE *stream;
stream = open_memstream (&content, &size);
char BUF[BUFSIZ];

if((recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    output_header = BUF;
}

while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    fprintf (stream, "%s", BUF); //i would just use this as input stream to zlib
}
fflush(stream);
fclose(stream);

printf("Header:\n%s\n", output_header);
printf("Content:\n%s\n", content);

两者都给出相同的结果，将它们打印到终端，但第二个应该打印出更多的中断，至少我期望，因为它们在拆分字符串时会丢失。

我是c新手，所以我可能只是监督一些简单的东西。

score 8 · Accepted Answer

您正在recv()循环调用，直到套接字断开连接或失败（并将接收到的数据以错误的方式写入您的流），将所有原始数据存储到您的char*缓冲区中。这不是读取 HTTP 响应的正确方法，尤其是在使用 HTTP keep-alives 的情况下（在这种情况下，响应结束时不会发生断开连接）。您必须遵循RFC 2616中列出的规则。即：

阅读直到"\r\n\r\n"遇到序列。这将终止响应标头。不要再读过去的任何字节。
根据RFC 2616 第 4.4 节中的规则分析接收到的标头。它们会告诉您剩余响应数据的实际格式。
根据 #2 中发现的格式读取剩余数据（如果有）。
如果响应使用 HTTP 1.1，则检查接收到的标头是否存在标头，如果响应使用 HTTP 0.9 或 1.0，则检查是否Connection: close缺少标头。Connection: keep-alive如果检测到，请关闭您的套接字连接端，因为服务器正在关闭它的端部。否则，保持连接打开并将其重新用于后续请求（除非您使用完连接，在这种情况下关闭它）。
根据需要处理接收到的数据。

简而言之，您需要做更多类似这样的事情（伪代码）：

string headers[];
byte data[];

string statusLine = read a CRLF-delimited line;
int statusCode = extract from status line;
string responseVersion = extract from status line;

do
{
    string header = read a CRLF-delimited line;
    if (header == "") break;
    add header to headers list;
}
while (true);

if ( !((statusCode in [1xx, 204, 304]) || (request was "HEAD")) )
{
    if (headers["Transfer-Encoding"] ends with "chunked")
    {
        do
        {
            string chunk = read a CRLF delimited line;
            int chunkSize = extract from chunk line;
            if (chunkSize == 0) break;

            read exactly chunkSize number of bytes into data storage;

            read and discard until a CRLF has been read;
        }
        while (true);

        do
        {
            string header = read a CRLF-delimited line;
            if (header == "") break;
            add header to headers list;
        }
        while (true);
    }
    else if (headers["Content-Length"] is present)
    {
        read exactly Content-Length number of bytes into data storage;
    }
    else if (headers["Content-Type"] begins with "multipart/")
    {
        string boundary = extract from Content-Type header;
        read into data storage until terminating boundary has been read;
    }
    else
    {
        read bytes into data storage until disconnected;
    }
}

if (!disconnected)
{
    if (responseVersion == "HTTP/1.1")
    {
        if (headers["Connection"] == "close")
            close connection;
    }
    else
    {
        if (headers["Connection"] != "keep-alive")
            close connection;
    }
}

check statusCode for errors;
process data contents, per info in headers list;

c - http服务器响应（套接字）的标头和内容之间的差异

1 回答 1

Related

Reference