我正在尝试使用 windows-1254 字符集从站点解析 html。但所有土耳其语字符显示如下: � � � � �</p>
实际问题在哪里?我确实尝试了这些:
webClient.Encoding = System.Text.Encoding.UTF8
webClient.Encoding = System.Text.Encoding.GetString("UTF-8");
作为函数:
public string ReplaceText(string _text)
{
_text = _text.Replace("Ä°", "İ").Replace("ı", "ı").Replace("ü", "ü").Replace("ÅŸ", "ş").Replace("Å", "Ş").Replace("ç", "ç").Replace("ö", "ö").Replace("ÄŸ", "ğ").Replace("Ç", "Ç").Replace("Ö", "Ö").Replace("Ãœ", "Ü");
return _text;
}
还有这个标题:
webClient.Headers["User-Agent"] = "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)";
webClient.Headers["Accept-Charset"] = "windows-1254,utf-8;q=0.7,*;q=0.7";
(也有 iso-8859-9,utf8)
这就是我使用网络客户端的方式:
WebClient wb = new WebClient();
wb.Headers["User-Agent"] = "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)";
wb.Headers["Accept-Charset"] = "windows-1254,utf-8;q=0.7,*;q=0.7";
wb.DownloadStringAsync(new Uri("http://www.site.com"));
wb.Encoding = System.Text.Encoding.UTF8;
wb.DownloadStringCompleted += new DownloadStringCompletedEventHandler(DSC);
处理程序:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(e.Result);
var inputs = htmlDoc.DocumentNode.SelectNodes("//div[@id=\"mrln-eyhaber\"]//a");
foreach (var input in inputs)
{
textarea.Text += this.ReplaceText(input.Attributes["title"].Value.ToString()) + "\n\n";
}