security - 基于 unicode 的 XSS 是个问题吗？

Question

score 10 · Accepted Answer

That would be in direct violation of HTML specifications. By them, the markup-significant characters are Ascii characters, whereas characters like U+FF1C FULLWIDTH LESS-THAN SIGN “＜” are just data characters with no special significance. Browsers would need extra code to map fullwidth characters to Ascii (either as an ad hoc mapping or e.g. via normalization to NFKD or NFCKC), but there’s no reason to assume they would do such things, any more than there is a reason to think that they could start mapping “[” to “<”.

So a blog that claims otherwise is just describing a possibility that someone invented but has no real grounds. You can usually see this from the references and demonstrations given. (That is, from the absence of them.)

There are surely security issues around Unicode characters that look similar to each other, but then it’s a matter of human beings mistakenly taking one character for another even though they are internally quite different, like “＜” for “<” (and therefore e.g. seeing a string in HTML source as a script element even though it isn’t) or “а” for “a” (a Cyrillic letter for a Latin letter with identical appearance). That is, people may see characters as identical even though programs see them as distinct.

score 4 · Accepted Answer

No, a browser will not interpret text surrounded by fullwidth LT or GT signs as valid HTML tags, but certain backends will transform them into normal LT or GT signs, creating an XSS risk. See the following: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit

security - 基于 unicode 的 XSS 是个问题吗？

2 回答 2

Related

Reference