65

我想显示原始 HTML。我们都知道必须像这样逃避每个“<”和“>”:

     <PRE> this is a test  &ltDIV&gt </PRE>

但是,我不想这样做。我想要一种保持 HTML 代码原样的方法(因为它更易于阅读,(在编辑器中)我可能想复制它并自己再次使用它作为实际的 HTML 代码,并且不想必须再次更改它或拥有相同代码的两个版本,一个已转义,一个未转义。

是否有任何其他比 PRE 更“原始”的环境可能允许这样做?因此,人们不必在每次想要显示一些原始 HTML 代码(可能是在 HTML5 中)时都继续编辑 HTML 并更改所有内容?

就像是<REALLY_REALLY_VERBATIM> ...... </<REALLY_REALLY_VERBATIM>

JavaScript 解决方案在 Firefox 21 上不起作用,这是一个屏幕截图:

非工作代码截图

第一个解决方案仍然无法在 Firefox 上运行,这是一个屏幕截图:

HTML 代码截图

4

8 回答 8

100

您可以使用该xmp元素,请参阅<XMP> 标签的用途是什么?. 它从一开始就在 HTML 中,并且被所有浏览器支持。规范不赞成它,但 HTML5 CR 仍然描述它并要求浏览器支持它(虽然它也告诉作者不要使用它,但它并不能真正阻止你)。

里面的所有东西都是xmp这样的,没有任何标记(标签或字符引用)在那里被识别,除了,出于明显的原因,元素本身的结束标签,</xmp>.

否则xmp呈现为pre.

当使用“真正的 XHTML”时,即 XHTML 以 XML 媒体类型提供服务(这种情况很少见),特殊的解析规则不适用,因此xmp被视为pre. 但在“真正的 XHTML”中,您可以使用 CDATA 部分,这意味着类似的解析规则。它没有特殊格式,因此您可能希望将其包装在一个pre元素中:

<pre><![CDATA[
This is a demo, tags like <p> will
appear literally.
]]></pre>

我看不出你如何结合xmpCDATA 部分来实现所谓的多语言标记

于 2013-05-28T07:17:37.623 回答
26

Essentially the original question can be broken down in 2 parts:

  • Main objective/challenge: embedding(/transporting) a raw formatted code-snippet (any kind of code) in a web-page's markup (for simple copy/paste/edit due to no encoding/escaping)
  • correctly displaying/rendering that code-snippet (possibly edit it) in the browser

The short (but) ambiguous answer is: you can't, ...but you can (get very close).
(I know, that are 3 contradicting answers, so read on...)

(polyglot)(x)(ht)ml Markup-languages rely on wrapping (almost) everything between begin/opening and end/closing tags/character(sequences).
So, to embed any kind of raw code/snippet inside your markup-language, one will always have to escape/encode every instance (inside that snippet) that resembles the character(-sequence) that would close the wrapping 'container' element in the markup. (During this post I'll refer to this as rule no 1.)
Think of "some "data" here" or <i>..close italics with '</i>'-tag</i>, where it is obvious one should escape/encode (something in) </i and " (or change container's quote-character from " to ').

So, because of rule no 1, you can't 'just' embed 'any' unknown raw code-snippet inside markup.
Because, if one has to escape/encode even one character inside the raw snippet, then that snippet would no longer be the same original 'pure raw code' that anyone can copy/paste/edit in the document's markup without further thought. It would lead to malformed/illegal markup and Mojibake (mainly) because of entities.
Also, should that snippet contain such characters, you'd still need some javascript to 'translate' that character(sequence) from (and to) it's escaped/encoded representation to display the snippet correctly in the 'webpage' (for copy/paste/edit).

That brings us to (some of) the datatypes that markup-languages specify. These datatypes essentially define what are considered 'valid characters' and their meaning (per tag, property, etc.):

  • PCDATA (Parsed Character DATA): will expand entities and one must escape <, & (and > depending on markup language/version).
    Most tags like body, div, pre, etc, but also textarea (until HTML5) fall under this type.
    So not only do you need to encode all the container's closing character-sequences inside the snippet, you also have to encode all <, & (,>) characters (at minimum).
    Needless to say, encoding/escaping this many characters falls outside this objective's scope of embedding a raw snippet in the markup.
    '..But a textarea seems to work...', yes, either because of the browsers error-engine trying to make something out of it, or because HTML5:

  • RCDATA (Replaceable Character DATA): will not not treat tags inside the text as markup (but are still governed by rule 1), so one doesn't need to encode < (>). BUT entities are still expanded, so they and 'ambiguous ampersands' (&) need special care.
    The current HTML5 spec says the textarea is now a RCDATA field and (quote):

    The text in raw text and RCDATA elements must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).

    Thus no matter what, textarea needs a hefty entity translation handler or it will eventually Mojibake on entities!

  • CDATA (Character Data) will not treat tags inside the text as markup and will not expand entities.
    So as long as the raw snippet code does not violate rule 1 (that one can't have the containers closing character(sequence) inside the snippet), this requires no other escaping/encoding.

Clearly this boils down to: how can we minimize the number of characters/character-sequences that still need to be encoded in the snippet's raw source and the number of times that character(sequence) might appear in an average snippet; something that is also of importance for the javascript that handles the translation of these characters (if they occur).

So what 'containers' have this CDATA context?

Most value properties of tags are CDATA, so one could (ab)use a hidden input's value property (proof of concept jsfiddle here).
However (conform rule 1) this creates an encoding/escape problem with nested quotes (" and ') in the raw snippet and one needs some javascript to get/translate and set the snippet in another (visible) element (or simply setting it as a text-area's value). Somehow this gave me problems with entities in FF (just like in a textarea). But it doesn't really matter, since the 'price' of having to escape/encode nested quotes is higher then a (HTML5) textarea (quotes are quite common in source code..).

What about trying to (ab)use <![CDATA[<tag>bla & bla</tag>]]>?
As Jukka points out in his extended answer, this would only work in (rare) 'real xhtml'.
I thought of using a script-tag (with or without such a CDATA wrapper inside the script-tag) together with a multi-line comment /* */ that wraps the raw snippet (script-tags can have an id and you can access them by count). But since this obviously introduces a escaping problem with */, ]]> and </script in the raw snippet, this doesn't seem like a solution either.

Please post other viable 'containers' in the comments to this answer.

By the way, encoding or counting the number of - characters and balancing them out inside a comment tag <!-- --> is just insane for this purpose (apart from rule 1).


That leaves us with Jukka K. Korpela's excellent answer: the <xmp> tag seems the best option!

The 'forgotten' <xmp> holds CDATA, is intended for this purpose AND is indeed still in the current HTML 5 spec (and has been at least since HTML3.2); exactly what we need! It's also widely supported, even in IE6 (that is.. until it suffers from the same regression as the scrolling table-body).
Note: as Jukka pointed out, this will not work in true xhtml or polyglot (that will treat it as a pre) and the xmp tag must still adhere to rule no 1. But that's the 'only' rule.

Consider the following markup:

<!-- ATTENTION: replace any occurrence of &lt;/xmp with </xmp -->
<xmp id="snippet-container">
<div>
    <div>this is an example div &amp; holds an xmp tag:<br />
        <xmp> 
<html><head>  <!-- indentation col 0!! -->
    <title>My Title</title>
</head><body>
    <p>hello world !!</p>
</body></html>
        &lt;/xmp>  <!-- note this encoded/escaped tag -->
    </div>
    This line is also part of the snippet
</div>
</xmp>

The above codeblok illustrates a raw piece of markup where <xmp id="snippet-container"> contains an (almost raw) code-snippet (containing div>div>xmp>html-document).
Notice the encoded closing tag in this markup? To comply with rule no 1, this was encoded/escaped).

So embedding/transporting the (sometimes almost) raw code is/seems solved.

What about displaying/rendering the snippet (and that encoded &lt;/xmp>)?

The browser will (or it should) render the snippet (the contents inside snippet-container) exactly the way you see it in the codeblock above (with some discrepancy amongst browsers whether or not the snippet starts with a blank line).
That includes the formatting/indentation, entities (like the string &amp;), full tags, comments AND the encoded closing tag &lt;/xmp> (just like it was encoded in the markup). And depending on browser(version) one could even try use the property contenteditable="true" to edit this snippet (all that without javascript enabled). Doing something like textarea.value=xmp.innerHTML is also a breeze.

So you can... if the snippet doesn't contain the containers closing character-sequence.

However, should a raw snippet contain the closing character-sequence </xmp (because it is an example of xmp itself or it contains some regex, etc), you must accept that you have to encode/escape that sequence in the raw snippet AND need a javascript handler to translate that encoding to display/render the encoded &lt;/xmp> like </xmp> inside a textarea (for editing/posting) or (for example) a pre just to correctly render the snippet's code (or so it seems).

A very rudimentary jsfiddle example of this here. Note that getting/embedding/displaying/retrieving-to-textarea worked perfect even in IE6. But setting the xmp's innerHTML revealed some interesting 'would-be-intelligent' behavior on IE's part. There is a more extensive note and workaround on that in the fiddle.

But now comes the important kicker (another reason why you only get very close): Just as an over-simplified example, imagine this rabbit-hole:

Intended raw code-snippet:

<!-- remember to translate between </xmp> and &lt;/xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>

Well, to comply with rule 1, we 'only' need to encode those </xmp[> \n\r\t\f\/] sequences, right?

So that gives us the following markup (using just a possible encoding):

<xmp id="container">
<!-- remember to translate between &lt;/xmp> and &lt;/xmp> -->
<xmp>
<p>a paragraph</p>
&lt;/xmp>
</xmp>

Hmm.. shalt I get my crystal ball or flip a coin? No, let the computer look at its system-clock and state that a derived number is 'random'. Yes, that should do it..

Using a regex like: xmp.innerHTML.replace(/&lt;(?=\/xmp[> \n\r\t\f\/])/gi, '<');, would translate 'back' to this:

<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>

Hmm.. seems this random generator is broken... Houston..?
Should you have missed the joke/problem, read again starting at the 'intended raw code-snippet'.

Wait, I know, we (also) need to encode .... to ....
Ok, rewind to 'intended raw code-snippet' and read again.
Somehow this all begins to smell like the famous hilarious-but-true rexgex-answer on SO, a good read for people fluent in mojibake.

Maybe someone knows a clever algorithm or solution to fix this problem, but I assume that the embedded raw code will get more and more obscure to the point where you'd be better of properly escaping/encoding just your <, & (and >), just like the rest of the world.

Conclusion: (using the xmp tag)

  • it can be done with known snippets that do not contain the container's closing character-sequence,
  • we can get very close to the original objective with known snippets that only use 'basic first-level' escaping/encoding so we don't fall in the rabbithole,
  • but ultimately it seems that one can't do this reliably in a 'production-environment' where people can/should copy/paste/edit 'any unknown' raw snippets while not knowing/understanding the implications/rules/rabbithole (depending on your implementation of handling/translating for rule 1 and the rabbit-hole).

Hope this helps!

PS: Whilst I would appreciate an upvote if you find this explanation useful, I kind of think Jukka's answer should be the accepted answer (should no better option/answer come along), since he was the one who remembered the xmp tag (that I forgot about over the years and got 'distracted' by the commonly advocated PCDATA elements like pre, textarea, etc.).
This answer originated in explaining why you can't do it (with any unknown raw snippet) and explain some obvious pitfalls that some other (now deleted) answers overlooked when advising a textarea for embedding/transport. I've expanded my existing explanation to also support and further explain Jukka's answer (since all that entity and *CDATA stuff is almost harder than code-pages).

于 2013-05-28T06:04:57.167 回答
12

便宜又开朗的回答:

<textarea>Some raw content</textarea>

textarea 将逐字处理制表符、多个空格、换行符、换行。它可以很好地复制和粘贴其有效的 HTML。它还允许用户调整代码框的大小。你不需要任何 CSS、JS、转义、编码。

您也可以更改外观和行为。这是一个等宽字体,已禁用编辑,字体较小,无边框:

<textarea
    style="width:100%; font-family: Monospace; font-size:10px; border:0;"
    rows="30" disabled
>Some raw content</textarea>

这个解决方案可能在语义上不正确。因此,如果您需要,最好选择更复杂的答案。

于 2017-02-08T03:41:40.147 回答
5

xmp是要走的路,即:

<xmp>
  # your code...
</xmp>
于 2017-05-28T10:18:04.683 回答
4
echo '<pre>' . htmlspecialchars("<div><b>raw HTML</b></div>") . '</pre>';

我想这就是你要找的吗?

换句话说,在 PHP 中使用 htmlspecialchars()

于 2015-04-07T21:18:49.897 回答
3

@GitaarLAB 和 @Jukka 详细说明该<xmp>标签已过时,但仍然是最好的。当我这样使用它时

<xmp>
<div>Lorem ipsum</div>
<p>Hello</p>
</xmp>

然后将第一个 EOL 插入代码中,看起来很糟糕

可以通过删除 EOL 来解决

<xmp><div>Lorem ipsum</div>
<p>Hello</p>
</xmp>

但是在源代码中看起来很糟糕。我曾经用 wrapping 解决它<div>,但最近我想出了一个很好的 CSS3 规则,我希望它也可以帮助某人:

xmp { margin: 5px 0; padding: 0 5px 5px 5px; background: #CCC; }
xmp:before { content: ""; display: block; height: 1em; margin: 0 -5px -2em -5px; }

看起来更好

于 2015-10-04T16:18:05.953 回答
1

如果您启用了 jQuery,您可以使用 escapeXml 函数,而不必担心转义箭头或特殊字符。

<pre>
  ${fn:escapeXml('
    <!-- all your code --> 
  ')};
</pre>
于 2014-12-17T23:22:51.127 回答
1

<code>标签是好方法,因为<xmp>标签<pre>不支持换行

echo '<code>' . htmlspecialchars("<div><b>hello world</b></div>") . '</code>';
于 2021-05-02T19:43:18.897 回答