1

我下载了一个网页,其中包含带有此类引号的段落

“我简单地从 html 页面中提取了这一行”</p>

但是当我写入文件时,这个“字符没有正确显示。

WebClient wc = new WebClient();
Stream strm = wc.OpenRead("http://images.thenews.com.pk/21-08-2013/ethenews/t-24895.htm");
StreamReader sr = new StreamReader(strm);
StreamWriter sw = new StreamWriter("D://testsharp.txt");
String line;
Console.WriteLine(sr.CurrentEncoding);

while ((line = sr.ReadLine()) != null) {
    sw.WriteLine(line);
}
sw.Close();
strm.Close();
4

1 回答 1

1

If all you want to do is to write the file to disk, then: use the Stream API directly, or (even easier) just use:

wc.DownloadFile("http://images.thenews.com.pk/21-08-2013/ethenews/t-24895.htm",
    @"D:\testsharp.txt");

If you don't treat it as binary, then you need to worry about encodings - and it isn't enough just to look at sr.CurrentEncoding, because we can't be sure that it detected it correctly. It could be that the encoding was reported in the HTTP headers, which would be nice. It could also be that the encoding is specified in a BOM at the start of the payload. However, in the case of HTML the encoding could also be specified inside the HTML. In all three cases, treating the file as binary will improve things (for the BOM and inside-the-html cases, it will fix it entirely).

于 2013-08-22T08:29:23.327 回答