0
4

1 回答 1

0

I think your parser is working fine. It's just that the page either A) is using mixed/incorrect encoding or B) is actually writing the unicode replacement character '�', ie the characters got munged somewhere before being output to the page (like going in/out the database). Where accents are correctly showing up, the page is using html entities, not the characters themselves.

if A) You could to try to detect coding (a pain, problematic).

if B) You can't do anything.

于 2012-07-30T17:32:33.943 回答