我正在使用 Mechanize 阅读位于http://www.daz3d.com/pirates-black-pearl的网页。
该页面似乎阅读正常,但由于某种原因,某些字符的翻译方式不同。
例如,有一个
当我在 Firefox 中查看页面的源代码时,产品描述中看起来像这样:
<p>Pirates – Black Pearl is a high quality conforming clothing from Pretty3D. Designed specifically for Victoria 4, Pirates – Black Pearl is a complete conforming outfit that includes a Dress, Corset, Panty, Boots, Necklace, Pistol Holder, and Seven Props.</p>
但是,当我查看 Mechanize 下载的内容时,我看到:
<p>Pirates – Black Pearl is a high quality conforming clothing from Pretty3D. Designed specifically for Victoria 4, Pirates – Black Pearl is a complete conforming outfit that includes a Dress, Corset, Panty, Boots, Necklace, Pistol Holder, and Seven Props.</p>
请注意 - 替换为 –。
字符集在标头中设置为 utf-8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
这发生在许多应该是通用 ASCII 字符的情况下。
这里发生了什么,我该如何解决?
我知道这是一个 unicode 问题,但不知道如何处理。