I'm trying to learn more about how web and tcp work by implementing web tcp client.
Currently, my web request function looks like this:
public string SendWebRequest(SocketWebRequest request)
{
using (NetworkStream ns = tc.GetStream())
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(ns))
{
request.WriteTo(ns);
ns.Flush();
var statusLine = sr.ReadLine();
ProcessStatusLine(statusLine);
Headers = ReadHeaders(sr);
ProcessCookies(request.Host);
int contentLength = 0;
if (Headers.ContainsKey("Content-Length"))
{
foreach (var cl in Headers["Content-Length"])
{
int buf;
if (int.TryParse(cl,out buf))
{
contentLength = buf;
break;
}
}
}
if (contentLength==0)
{
return "";
}
byte[] content = new byte[contentLength];
if (IsGziped())
{
MemoryStream decompressed = new MemoryStream();
using (var zs = new GZipStream(ns, CompressionMode.Decompress))
{
while (true)
{
var buf = new byte[1024];
int read = zs.Read(buf, 0, buf.Length);
if (read == 0)
{
break;
}
decompressed.Write(buf, 0, read);
}
}
content = decompressed.ToArray();
}
else
{
using (BinaryReader rdr = new BinaryReader(ns))
{
rdr.Read(content, 0, content.Length);
}
}
var encoding = GetEncoding();
return encoding.GetString(content.ToArray());
}
}
}
the request looks like this:
GET http://www.youtube.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, */*
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host:www.youtube.com
and the response headers look like this:
HTTP/1.1 200 OK
Date: Sat, 25 Aug 2012 19:46:51 GMT
Server: Apache
X-Content-Type-Options: nosniff
Content-Encoding: gzip
Set-Cookie: use_hitbox=d5c5516c3379125f43aa0d495d100d6ddAEAAAAw; path=/; domain=.youtube.com
Set-Cookie: VISITOR_INFO1_LIVE=av7rkkf4Sfw; path=/; domain=.youtube.com; expires=Mon, 22-Apr-2013 19:46:51 GMT
Expires: Tue, 27 Apr 1971 19:44:06 EST
Cache-Control: no-cache
P3P: CP="This is not a P3P policy! See //support.google.com/accounts/bin/answer.py?answer=151657&hl=en-US for more info."
X-Frame-Options: SAMEORIGIN
Content-Length: 18977
Content-Type: text/html; charset=utf-8
And after this the first int read = zs.Read(buf, 0, buf.Length);
sometimes works, but often fails with following exception:
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. I've tried reading the data as string, and it looks encoded.
Youtube works fine via browser. When reading the data as a string, it looks encoded.
Why am I getting this, and how should I fix that?
UPDATE
It looks like this is some sort of error during transmission. In 5 cases out of 10, it works, in other 5 it fails without an apparent reason
Here's the code if IsGziped()
bool IsGziped()
{
foreach (var h in Headers["Content-Encoding"])
{
if (h.ToLowerInvariant().Contains("gzip"))
{
return true;
}
}
return false;
}