这是html的开始方式
商业文件
<p>Some company</p>
<p>
<p>DEPARTMENT: Legal Process</p>
<p>FUNCTION: Computer Department</p>
<p>PROCESS: Process Server</p>
<p>PROCEDURE: ABC Process Server</p>
<p>OWNER: Some User</p>
<p>REVISION DATE: 06/10/2013</p>
<p>
<p>OBJECTIVE: To ensure that the process server receive their invoices the following day.</p>
<p>
<p>WHEN TO PERFORM: Daily</p>
<p>
<p>WHO WILL PERFORM? Computer Team</p>
<p>
<p>TIME TO COMPLETE: 5 minutes</p>
<p>
<p>TECHNOLOGY REQUIREMENT(S): </p>
<p>
<p>SOURCE DOCUMENT(S): N/A</p>
<p>
<p>CODES AND DEFINITIONS: N/A</p>
<p>
<table border="1">
<tr>
<td>
<p>KPI’s: </p>
</td>
</tr>
</table>
<p>
<table border="1">
<tr>
<td>
<p>RISKS: </p>
</td>
</tr>
</table>
在这之后有一大堆文本。我需要做的是从上面我需要解析出特定的数据。
我需要解析出Department、Function、Process、Procedure。目标、何时执行、谁将执行、完成时间、技术要求、源文档、代码和定义、风险。
然后我需要从 Html 列中删除此信息,同时保留其他所有内容。这在 LINQ 中可行吗?
这是我正在使用的 LINQ 查询:
var result = (from d in IPACS_Documents
join dp in IPACS_ProcedureDocs on d.DocumentID equals dp.DocumentID
join p in IPACS_Procedures on dp.ProcedureID equals p.ProcedureID
where d.DocumentID == 4
&& d.DateDeleted == null
select d.Html);
Console.WriteLine(result);