1
4

1 回答 1

3

This character is U+FFFD, REPLACEMENT CHARACTER, commonly used to replace invalid data in streams thought to be some Unicode encoding.

For example if you had the text "Résumé" encoded as IS0 8859-1 and wanted to convert it to UTF-16, but told the conversion routine that the text was UTF-8 then the library would probably produce the UTF-16 text "R�sum�" (the other alternative would be to throw an error and not give any results).

Another way these may appear is if a web page declares that it is UTF-8 but it is not actually UTF-8. The browser is likely to do the re-encoding described above and the replacement characters will show up in the rendered web-page, but viewing the source with an editor that ignores or disregards the HTML encoding info will show the characters correctly.

From your comments it looks like your process is something like:

Excel -> export to csv -> process csv in js -> produce html

Windows software typically uses the platform's 'encoding for non-Unicode programs' for encoding eight bit text, not UTF-8. So the CSV file is probably Windows CP1252 (If you're using a version of windows set up for most of the western world), and if your javascript program is reading that data and copying it directly into HTML source that's supposed to be UTF-8, that would cause a problem that fits your description.

What you need to do convert from whatever encoding the CSV is using to UTF-8. Javascript doesn't really have the facilities to do this so your best bet is probably to convert the file after exporting it from Excel but before accessing it in JS.

Other alternatives are to change the encoding the HTML page is using to whatever the csv uses, or to not specify an encoding and leave it up to the browser to guess.

于 2012-10-31T16:18:15.443 回答