8

我有一个 erlang 字符串,其中可能包含 & " < 等字符:

1> Unenc = "string & \"stuff\" <".
ok

是否有一个 Erlang 函数可以解析字符串并编码所有需要的 HTML/XML 实体,例如:

2> Enc = xmlencode(Unenc).
"string &amp; &quot;stuff&quot; &lt;".

?

我的用例是来自用户输入的相对较短的字符串。xmlencode 函数的输出字符串将是 XML 属性的内容:

<company name="Acme &amp; C." currency="&euro;" />

最终的 XML 将通过网络适当地发送。

4

3 回答 3

4

Erlang 发行版中有一个函数可以转义尖括号和 & 符号,但没有记录,因此可能最好不要依赖它:

1> xmerl_lib:export_text("string & \"stuff\" <").
"string &amp; \"stuff\" &lt;"

如果您想构建/编码 XML 结构(而不​​仅仅是编码单个字符串),那么 xmerl API 将是一个不错的选择,例如

2> xmerl:export_simple([{foo, [], ["string & \"stuff\" <"]}], xmerl_xml).
["<?xml version=\"1.0\"?>",
 [[["<","foo",">"],
   ["string &amp; \"stuff\" &lt;"],
   ["</","foo",">"]]]]
于 2010-07-26T22:18:15.157 回答
2

If your needs are simple, you could do this with a map over the chars in the string.

quote($<) -> "&lt;";
quote($>) -> "&gt;";
quote($&) -> "&amp;";
quote($") -> "&quot;";
quote(C) -> C.

Then you would do

1> Raw = "string & \"stuff\" <".
2> Quoted = lists:map(fun quote/1, Raw).

But Quoted would not be a flat list, which is still fine if you are going to send it to a file or as a http reply. I.e. see Erlang's io-lists.

In more recent Erlang releases, there are now encode-decode functions for multibyte utf8 to wide-byte/codepoint representations, see the erlang unicode module.


Reformatted comments, to make code examples stand out:

ettore: That's kind of what I am doing, although I do have to support multibyte characters. Here's my code:

xmlencode([], Acc) -> Acc; 
xmlencode([$<|T], Acc) -> xmlencode(T, Acc ++ "&lt;"); % euro symbol
xmlencode([226,130,172|T], Acc) -> xmlencode(T, Acc ++ "&#8364;");
xmlencode([OneChar|T], Acc) -> xmlencode(T, lists:flatten([Acc,OneChar])). 

Although I would prefer not to reinvent the wheel if possible.

dsmith: The string that you are using would normally be a list of Unicode code-points (ie. a list of numbers), and so any given byte encoding is irrelevant. You would only need worry about specific encodings if you are working directly with binaries.

To clarify, the Unicode code-point for the euro symbol (decimal 8364) would be a single element in your list. So you would just do this:

xmlencode([8364|T], Acc) -> xmlencode(T, Acc ++ "&#8364;"); 
于 2010-07-26T22:06:23.513 回答
1

我不知道包含的 OTP 包中有一个。然而 Mochiweb 的 mochiweb_html 模块:有一个转义函数:mochiweb_html.erl它处理列表、二进制文件和原子。

对于 url 编码,请查看 mochiweb_util 模块:mochiweb_util.erl及其 urlscape 功能。

您可以使用其中任何一个库来获得所需的内容。

于 2010-07-28T06:04:50.410 回答