2

一些 Web 服务器在 HTTP 响应标头中返回设置为零的内容长度。我想要一个确定性和高性能的解决方案来接收这种情况下的所有数据。

已知会表现出此行为的 URL(下面的其他 URL):

http://www.washingtonpost.com/wp-dyn/content/article/2010/02/12/AR2010021204894.html?hpid=topnews

标题:

Cache-control:no-cache
Connection:close
Content-Encoding:gzip
Content-type:text/html
Server:Web Server
Transfer-encoding:chunked

由于 MaxTries 常量,我当前的解决方案不能保证获得所有数据,并且由于 Thread.Sleep() 而速度很慢

private bool MoreDataIsAvailable()
{
    int avail = _socket.Available;
    if (avail == 0 &&
        _contentLength != null && _contentLength == 0)
    {
        int tries = 0;
        while (avail == 0 && tries < MaxTries)
        {
            Thread.Sleep(5);
            _socket.Poll(1000, SelectMode.SelectRead);
            avail = _socket.Available;
            tries++;
            if (avail > 0)
            {
                Console.WriteLine(_socket.Handle + " avail = " + avail + " received = " + _bytes.Length + " && tries = " + tries);
            }
        }
    }
    return avail > 0;
}

上下文中的用法:

private void ReceiveCallback(object sender, SocketAsyncEventArgs e)
{
    if (ConnectionWasClosed(e) || HadSocketError(e))
    {
        _receiveDone.Set();
        return;
    }

    StoreReceivedBytes(e);

    if (AllBytesReceived())
    {
        _receiveDone.Set();
        return;
    }

    if (MoreDataIsExpected() || MoreDataIsAvailable())
    {
        WaitForBytes(e);
    }
    else
    {
        _receiveDone.Set();
    }
}

样本输出:

1436 avail = 3752 received = 1704 && tries = 9
1436 avail = 3752 received = 9208 && tries = 8
1436 avail = 3752 received = 12960 && tries = 9
1436 avail = 3752 received = 20464 && tries = 8
1436 avail = 3752 received = 27968 && tries = 7
1436 avail = 7504 received = 31720 && tries = 1
1436 avail = 3752 received = 39224 && tries = 6

编辑:

Nikolai 观察到带有Transfer-encoding: chunked header 的响应需要特殊处理,但可以确定地检测到它们的末端。

然而,除了分块的响应之外,还有其他 URL 最终出现在我的 catch-all 方法中,例如:

http://www.biomedcentral.com/1471-2105/6/197

标题:

Cache-control:private
Connection:close
Content-Type:text/html
P3P:policyref="/w3c/p3p.xml", CP="NOI DSP COR CURa ADMa DEVa TAIa OUR BUS PHY ONL UNI COM NAV INT DEM PRE"
Server:Microsoft-IIS/5.0
X-Powered-By:ASP.NET

http://slampp.abangadek.com/info/

标题:

Connection:close
Content-Type:text/html
Server:Apache/2.2.8 (Ubuntu) DAV/2 PHP/5.2.4-2ubuntu5.3 with Suhosin-Patch mod_ruby/1.2.6 Ruby/1.8.6(2007-09-24) mod_ssl/2.2.8 OpenSSL/0.9.8g
X-Cache:MISS from server03.abangadek.com
X-Powered-By:PHP/5.2.4-2ubuntu5.3

http://video.forbes.com/embedvideo/?format=frame&height=515&width=336&mode=render&networklink=1

标题:

Connection:close
Content-Language:en-US
Content-Type:text/html;charset=ISO-8859-1
Server:Apache-Coyote/1.1

我想知道我可以在这些响应中寻找什么,就像 Transfer-encoding 标头对第一个 URL 所做的那样,提供了确定性读取整个响应的线索,从而可以避免调用我的方法。

4

1 回答 1

1

从给出的 URL 看来,您正在查看HTTP Chunked Transfer Encoding,它允许服务器在知道总长度之前开始传输响应,同时仍然允许客户端可靠地确定响应的结束。

另请参阅RFC 2616,第 3.6.1 节

于 2010-02-13T21:51:22.977 回答