我有一个使用 Lua 和/或 XSL 处理的文档,因为我使用的解决方案允许两者。正在处理的数据是来自 Lync 2013 的 IM 对话的汇编。我已经能够编写一些模式匹配脚本,这些脚本在下面提取我的数据的一些值,但是由于用户能够配置他们希望他们的数据如何显示在他们的 IM 上,每个用户的数据存储方式不同。
我需要的是一个脚本,它将提取消息的 To、From、Date/Time 和 Content 中的所有值。我注意到每个单词,当包装在 RTF 标记中时,后面跟着字符串 '\embo0'。
有没有一种方法可以处理整个数据集,如下例所示,以在示例数据下产生我想要的结果?我拥有的脚本只能提取与我定义的模式匹配方案之一相匹配的对话部分,然后去除其他所有内容。
数据:
<?xml version="1.0" encoding="utf-8"?>
<session Type="Conversation" SessionIdTime="2013-01-18 17:18:01Z" SessionIdSeq="1">
<Reference>OCSSession-Conversation_2013-01-18 17:18:01Z_1</Reference>
<participants>
<participant>
<name>user1@company.com</name>
</participant>
<participant>
<name>user2@company.com</name>
</participant>
</participants>
<conversation InviteTime="2013-01-18 17:18:01Z" InitiatedBy="user1@company.com" />
<messages>
<message Id="1" Time="2013-01-18 17:18:01Z">
<from>user1@company.com</from>
<to>user2@company.com</to>
<content Type="text/html"><span style="font-family:Segoe UI; color:#000000; font-size:10pt;">Test from Lync 2013</span></content>
</message>
<message Id="2" Time="2013-01-18 17:18:02Z">
<from>user1@company.com</from>
<to>user2@company.com</to>
<content Type="text/rtf">{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}}
{\colortbl ;\red0\green0\blue0;}
{\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1
\pard\cf1\embo\f0\fs20 Test\embo0 \embo from\embo0 \embo Lync\embo0 \embo 2013\embo0 \f1\par
{\*\lyncflags rtf=1}}
</content>
</message>
<message Id="3" Time="2013-01-18 17:18:07Z">
<from>user2@company.com</from>
<to>user1@company.com</to>
<content Type="text/html"><DIV style="font-size: 9pt;font-family: MS Shell Dlg 2;color: #000000;direction: ltr">got it</DIV></content>
</message>
<message Id="4" Time="2013-01-18 17:20:05Z">
<from>user1@company.com</from>
<to>user2@company.com</to>
<content Type="text/rtf">{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil Segoe UI;}}
{\colortbl ;\red0\green0\blue0;\red0\green0\blue255;}
{\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1
\pard {\cf1\outl\f0\fs20{\field{\*\fldinst{HYPERLINK http://jefferytay.wordpress.com /2010/12/09/converting-a-pfx-file-to-pem-and-key-via-openssl/ }}{\fldrslt{http://jefferytay.wordpress.com/2010/12/09/converting-a-pfx-file-to-pem-and-key-via-openssl/\ul0\cf0}}}}\f0\fs20\par
{\*\lyncflags rtf=1}}
</content>
</message>
<message Id="5" Time="2013-01-18 17:20:19Z">
<from>user1@company.com</from>
<to>user2@company.com</to>
<content Type="text/rtf">{\rtf1\fbidis\ansi\ansicpg1252\deff0 \nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}}
{\colortbl ;\red0\green0\blue0;}
{\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1
\pard\cf1\embo\f0\fs20 How\embo0 \embo does\embo0 \embo the\embo0 \embo URL\embo0 \embo look\embo0 \embo on\embo0 \embo your\embo0 \embo end?\embo0\f1\par
{\*\lyncflags rtf=1}}
</content>
</message>
<message Id="6" Time="2013-01-18 17:20:25Z">
<from>user2@company.com</from>
<to>user1@company.com</to>
<content Type="text/html"><DIV style="font-size: 9pt;font-family: MS Shell Dlg 2;color: #000000;direction: ltr">its plain text</DIV></content>
</message>
<message Id="7" Time="2013-01-18 17:20:34Z">
<from>user2@company.com</from>
<to>user1@company.com</to>
<content Type="text/html"><DIV style="font-size: 9pt;font-family: MS Shell Dlg 2;color: #000000;direction: ltr">not clickable</DIV></content>
</message>
<message Id="8" Time="2013-01-18 17:20:50Z">
<from>user2@company.com</from>
<to>user1@company.com</to>
<content Type="text/html"><DIV style="font-size: 9pt;font-family: MS Shell Dlg 2;color: #000000;direction: ltr">how does this look?&nbsp; _http://www.cnn.com</DIV></content>
</message>
<message Id="9" Time="2013-01-18 17:21:07Z">
<from>user2@company.com</from>
<to>user1@company.com</to>
<content Type="text/html"><DIV style="font-size: 9pt;font-family: MS Shell Dlg 2;color: #000000;direction: ltr">_http://powertoe.wordpress.com/2009/12/14/powershell-part-4-arrays-and-for-loops/</DIV></content>
</message>
<message Id="10" Time="2013-01-18 17:21:38Z">
<from>user1@company.com</from>
<to>user2@company.com</to>
<content Type="text/rtf">{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}}
{\colortbl ;\red0\green0\blue0;\red0\green0\blue255;}
{\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1
\pard\cf1\embo\f0\fs20 Please\embo0 \embo go\embo0 \embo ahead\embo0 \embo and\embo0 \embo install\embo0 \embo the\embo0 \embo new\embo0 \embo client\embo0 {\embo{\field{\*\fldinst{HYPERLINK "n:\\\\apps\\\\microsoft\\\\lync2013\\\\client\\\\setup.exe"}}{\fldrslt{n:\\apps\\microsoft\\lync2013\\client\\setup.exe\ul0\cf0}}}}\f0\fs20 \embo Once\embo0 \embo you\embo0 \embo install\embo0 \embo it,\embo0 \embo it\embo0 \embo will\embo0 \embo force\embo0 \embo a\embo0 \embo reboot.\embo0 \embo After\embo0 \embo it\embo0 \embo reboots,\embo0 \embo you\embo0 \embo have\embo0 \embo to\embo0 \embo close\embo0 \embo out\embo0 \embo of\embo0 \embo communicator.exe\embo0 \embo completely.\embo0\f1\par
{\*\lyncflags rtf=1}}
</content>
</message>
</messages>
期望的输出:
From: user1@company.com</name>
To: user2@company.com</name>
2013-01-18 17:18:02Z
user1@company.com: Test from Lync 2013
2013-01-18 17:18:07Z
user2@company.com: got it
2013-01-18 17:20:05Z
user1@company.com: http://jefferytay.wordpress.com/2010/12/09/converting-a-pfx-file-to-pem-and-key-via-openssl/
2013-01-18 17:20:19Z: How does the URL look on your end?
2013-01-18 17:20:25Z
user2@company.com: its plain text
2013-01-18 17:20:34Z
user2@company.com: not clickable
2013-01-18 17:20:50Z
user2@company.com: how does this look? _http://www.cnn.com
2013-01-18 17:21:07Z
user2@company.com: _http://powertoe.wordpress.com/2009/12/14/powershell-part-4-arrays-and-for-loops/
2013-01-18 17:21:38Z:
user1@company.com: Please go ahead and install the new client
Once you install it, it will force a reboot. After it reboots, you have to close out of communicator.exe completely.