我运行以下代码:
$page = '<p>Ä</p>';
$DOM = new DOMDocument;
$DOM->loadHTML($page);
echo 'source:'.$page;
echo 'dom: '.$DOM->getElementsByTagName('p')->item (0)->textContent;
它输出以下内容:
来源:Ä
dom: ×
所以,我不明白为什么当文本通过 DOMDocument 时,它的编码会被破坏?
我运行以下代码:
$page = '<p>Ä</p>';
$DOM = new DOMDocument;
$DOM->loadHTML($page);
echo 'source:'.$page;
echo 'dom: '.$DOM->getElementsByTagName('p')->item (0)->textContent;
它输出以下内容:
来源:Ä
dom: ×
所以,我不明白为什么当文本通过 DOMDocument 时,它的编码会被破坏?
Here's a workaround that adds the proper encoding via meta header:
$DOM->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />' . $page);
I'm not sure if that's the actual character set you're trying to use, but adjust where necessary
See also: domdocument character set issue