这让我困惑了一段时间。随着 UTF-8 作为 Web 开发中事实上的标准的出现,我不确定在哪些情况下我应该使用 HTML 实体,哪些情况下我应该只使用 UTF-8 字符。例如,
- 破折号 (–,
&emdash;
) - 与号 (&,
&
) - 3/4 分数 (¾,
¾
)
请务必阐明这个问题。将不胜感激。
这让我困惑了一段时间。随着 UTF-8 作为 Web 开发中事实上的标准的出现,我不确定在哪些情况下我应该使用 HTML 实体,哪些情况下我应该只使用 UTF-8 字符。例如,
&emdash;
)&
)¾
)请务必阐明这个问题。将不胜感激。
You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:
code is clearer than the corresponding white space character.<
, &
, or "
.Entities may buy you some compatibility with brain-dead clients that don't understand encodings correctly. I don't believe that includes any current browsers, but you never know what other kinds of programs might be hitting you up.
More useful, though, is that HTML entities protect you from your own errors: if you misconfigure something on the server and you end up serving a page with an HTTP header that says it's ISO-8859-1
and a META
tag that says it's UTF-8
, at least your —es will always work.
I would not use UTF-8 for characters that are easily confused visually. For example, it is difficult to distinguish an emdash from a minus, or especially a non-breaking space from a space. For these characters, definitely use entities.
For characters that are easily understood visually (such as the chinese examples above), go ahead and use UTF-8 if you like.
Personally I do everything in utf-8 since a long time, however, in an html page, you always need to convert ampersands (&), greater than (>) and lesser then (<) characters to their equivalent entities, &, > and <
Also, if you intend on doing some programming using utf-8 text, there are a few thing to watch for.
HTML entities are useful when you want to generate content that is going to be included (dynamically) into pages with (several) different encodings. For example, we have white label content that is included both into ISO-8859-1 and UTF-8 encoded web pages...
If character set conversion from/to UTF-8 wasn't such a big unreliable mess (you always stumble over some characters and some tools that don't convert properly), standardizing on UTF-8 would be the way to go.
If your pages are correctly encoded in utf-8 you should have no need for html entities, just use the characters you want directly.
All of the previous answers make sense to me.
In addition: It mostly depends on the editor you intent to use and the document language. As a minimum requirement for the editor is that it supports the document language. That means, that if your text is in japanese, beware of using an editor which does not show them (i.e. no entities for the document itself). If its english, you can even use an old vim-like editor and use entities only for the relative seldom © and friends. Of course: > for > and other HTML-specials still need escapes. But even with the other latin-1 languages (german, french etc.) writing ä is a pain in you know where...
In addition, I personally write entities for invisible characters and those which are looking similar to standard-ascii and are therefore easily confused. For example, there is u1173 (looking like a dash in some charsets) or u1175, which looks like the vertical bar. I'd use entities for those in any case.