2

例如,我有以下字符串:

“嗨,我正在测试一个奇怪的字符 Ů,它是一个带圆圈的 U”

现在我的字符串使用 html 代码Ů来显示 U 形圆。但是,我需要它采用 unicode 格式,即。\u016E. 有没有什么好的系统方法可以用普通的 javascript 做到这一点?

4

1 回答 1

13

If you want to convert numeric HTML character references to Unicode escape sequences, try the following (doesn't work with with code points above 0xFFFF):

function convertCharRefs(string) {
    return string
        .replace(/&#(\d+);/g, function(match, num) {
            var hex = parseInt(num).toString(16);
            while (hex.length < 4) hex = '0' + hex;
            return "\\u" + hex;
        })
        .replace(/&#x([A-Za-z0-9]+);/g, function(match, hex) {
            while (hex.length < 4) hex = '0' + hex;
            return "\\u" + hex;
        });
}

If you simply want to decode the character references:

function decodeCharRefs(string) {
    return string
        .replace(/&#(\d+);/g, function(match, num) {
            return String.fromCodePoint(num);
        })
        .replace(/&#x([A-Za-z0-9]+);/g, function(match, num) {
            return String.fromCodePoint(parseInt(num, 16));
        });
}

Both functions use String.replace with a function as replacement.

于 2013-05-06T14:57:05.240 回答