1

我正在处理来自特定列的数据。我有 1 个表,其中有 varchar(max) 列。以下是该列中的输入,需要以下输出。

要求: 1) 删除所有 html 内容 2) 输出仅包含消息正文和附件链接。3)需要SQL查询或存储过程

输入 :

Sample message body.

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

------=_NextPart_001_0026_01C44313.1C14C7A0
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: 8bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:o = 
"urn:schemas-microsoft-com:office:office" xmlns:w = 
"urn:schemas-microsoft-com:office:word">
<HEAD>
</HEAD>
<BODY lang=EN-US style="tab-interval: 36.0pt" vLink=purple link=blue>
<p>This is html content.</p>
<p>Html content should be removed.</p>
</BODY></HTML>

------=_NextPart_001_0026_01C44313.1C14C7A0--
------=_NextPart_000_001B_01C44313.1B899EA0
Content-Type: application/vnd.ms-excel; name="attachment1.xls"
Content-Disposition: attachment; filename="attachment1.xls"
Content-Transfer-Encoding: base64

G:\fakepath\Attach\attachment1.xls
------=_NextPart_000_001B_01C44313.1B899EA0
Content-Type: application/msword; name="attachment2.DOC"
Content-Disposition: attachment; filename="attachment2.DOC"
Content-Transfer-Encoding: base64

G:\fakepath\Attach\attachment2.DOC
------=_NextPart_000_001B_01C44313.1B899EA0--

输出 :

Sample message body.

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.


G:\fakepath\Attach\attachment1.xls

G:\fakepath\Attach\attachment2.DOC

注意:该表中有几条记录在列中具有相似的值。(我的意思是有些记录有 1 个附件链接,有些有 2 个或更多。)


这是我到目前为止所做的:

SELECT STUFF(columnName,CHARINDEX('<!DOCTYPE HTML',columnName),CHARINDEX('</html>',columnName) - CHARINDEX('<!DOCTYPE HTML',columnName) + 7,'') AS removeHTML from TableName;

SELECT CASE WHEN CHARINDEX('Content-Transfer-Encoding: base64', columnName) > 0 THEN SUBSTRING(columnName, CHARINDEX('Content-Transfer-Encoding: base64', columnName) + 33, LEN(columnName))ELSE columnName END AS attachmentLinks from TableName
4

0 回答 0