html - 什么时候应该使用 HTML 实体？

Question

这让我困惑了一段时间。随着 UTF-8 作为 Web 开发中事实上的标准的出现，我不确定在哪些情况下我应该使用 HTML 实体，哪些情况下我应该只使用 UTF-8 字符。例如，

破折号 (–, &emdash;)
与号 (&, &)
3/4 分数 (¾, ¾)

请务必阐明这个问题。将不胜感激。

score 123 · Accepted Answer

123

于 2009-01-12T19:19:28.623 回答

score 87 · Accepted Answer

You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:

Your keyboard does not support the character you need to type. For example, many keyboards do not have em-dash or the copyright symbol.
Your editor does not support Unicode (very common some years ago, but probably not today).
You want to make it explicit in the source what is happening. For example, the   code is clearer than the corresponding white space character.
You need to escape HTML special characters like <, &, or ".

score 5 · Accepted Answer

Entities may buy you some compatibility with brain-dead clients that don't understand encodings correctly. I don't believe that includes any current browsers, but you never know what other kinds of programs might be hitting you up.

More useful, though, is that HTML entities protect you from your own errors: if you misconfigure something on the server and you end up serving a page with an HTTP header that says it's ISO-8859-1 and a META tag that says it's UTF-8, at least your —es will always work.

score 5 · Accepted Answer

I would not use UTF-8 for characters that are easily confused visually. For example, it is difficult to distinguish an emdash from a minus, or especially a non-breaking space from a space. For these characters, definitely use entities.

For characters that are easily understood visually (such as the chinese examples above), go ahead and use UTF-8 if you like.

score 5 · Accepted Answer

Personally I do everything in utf-8 since a long time, however, in an html page, you always need to convert ampersands (&), greater than (>) and lesser then (<) characters to their equivalent entities, &, > and <

Also, if you intend on doing some programming using utf-8 text, there are a few thing to watch for.

XML needs some extra lines to validate when using entities.
Some libraries do not play along nice with utf-8. For instance, PHP in some Linux distributions dropped full support for utf-8 in their regular expression libraries.
It is harder to limit the number of characters in a text that uses html entities, because a single entity uses many characters. Also there's always the risk of cutting the entity in half.

score 4 · Accepted Answer

HTML entities are useful when you want to generate content that is going to be included (dynamically) into pages with (several) different encodings. For example, we have white label content that is included both into ISO-8859-1 and UTF-8 encoded web pages...

If character set conversion from/to UTF-8 wasn't such a big unreliable mess (you always stumble over some characters and some tools that don't convert properly), standardizing on UTF-8 would be the way to go.

score 2 · Accepted Answer

If your pages are correctly encoded in utf-8 you should have no need for html entities, just use the characters you want directly.

score 2 · Accepted Answer

All of the previous answers make sense to me.

In addition: It mostly depends on the editor you intent to use and the document language. As a minimum requirement for the editor is that it supports the document language. That means, that if your text is in japanese, beware of using an editor which does not show them (i.e. no entities for the document itself). If its english, you can even use an old vim-like editor and use entities only for the relative seldom © and friends. Of course: > for > and other HTML-specials still need escapes. But even with the other latin-1 languages (german, french etc.) writing ä is a pain in you know where...

In addition, I personally write entities for invisible characters and those which are looking similar to standard-ascii and are therefore easily confused. For example, there is u1173 (looking like a dash in some charsets) or u1175, which looks like the vertical bar. I'd use entities for those in any case.

html - 什么时候应该使用 HTML 实体？

8 回答 8

Related

Reference