3

在使用 Erlang 解析 MIME 时,我能够提取标题、正文和附件。所以现在我必须分别解析所有这些部分。

标题结构:

Header-tag : header-value\n

例子:

Delivered-To: xyz@geodesic.com\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account "mail");\n\tFri, 03 Jul 2009 16:56:03 +0530\n

所以从上面的例子中我必须提取Delivered-To: koushik.narayanan@geodesic.comReceived: by 1.gnu.geodesic.net (fdm 1.5, account "mail");\n\tFri, 03 Jul 2009 16:56:03 +0530\n使用某种方式来分割\n。但是第二个标头的值包含\n\t所以 split 停在那里......我想要一个严格的拆分,它只会与\n.

提前致谢。

4

2 回答 2

4

顺便说一句,MIME 标头(几乎?)与 HTTP 标头相同,因此您可以使用 Erlang 内置的 HTTP 解码:(数据必须是二进制的,而不是字符串)

3> erlang:decode_packet(httph, <<"Delivered-To: xyz@geodesic.com\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>, []).
{ok,{http_header,0,"Delivered-To",undefined,
                 "xyz@geodesic.com"},
    <<"Received: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>}
4> Rest = element(3, v(-1)).

对,得到http_header记录中的第一个标题,以及剩余的数据。

<<"Received: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>
5> erlang:decode_packet(httph, Rest, []).
{more,undefined}

但是由于解码器无法在没有看到下一行的情况下知道标题行是否在下一行继续,所以这是行不通的。我们需要添加最后的空行:

6> erlang:decode_packet(httph, <<Rest/binary, "\r\n">>, []).
{ok,{http_header,0,"Received",undefined,
                 "by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530"},
    <<"\r\n">>}

当剩下的就是这些时,我们得到http_eoh

7> erlang:decode_packet(httph, <<"\r\n">>, []).
{ok,http_eoh,<<>>}

希望对您有所帮助……</p>

于 2009-07-06T12:48:10.547 回答
1

你的意思是这样的吗?

split(String) ->
  split(String, [], []).


split([], [], Result) ->
  lists:reverse(Result);

split([], Buffer, [{Key}|Result]) ->
  split([], [], [{Key, lists:reverse(Buffer)}|Result]);

split("\n\t" ++ String, Buffer, Result) ->
  split(String, "\t\n" ++ Buffer, Result);

split("\n" ++ String, Buffer, [{Key}|Result]) ->
  split(String, [], [{Key, lists:reverse(Buffer)}|Result]);

split(": " ++ String, Buffer, Result) ->
  split(String, [], [{lists:reverse(Buffer)}|Result]);

split([C|String], Buffer, Result) ->
  split(String, [C|Buffer], Result).

这是您的输入标头的结果:

> split("Delivered-To: xyz@geodesic.com\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n").
[{"Delivered-To","xyz@geodesic.com"},
 {"Received",
  "by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530"}]
于 2009-07-27T18:15:04.210 回答