-3

我有一个查询,其中一列是电子邮件标题的字符串,例如:

From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
Delivery-Date: Tue, 25 Jan 2011 15:31:01 -0700
Received: from po-out-1718.google.com ([72.14.252.155]:54907) by cl35.gs01.grid ...
Received: by po-out-1718.google.com with SMTP id y22so795146pof.4 for <user@exa ...
Received: by 10.141.116.17 with SMTP id t17mr3929916rvm.251.1214951458741; Tue,...
Received: by 10.140.188.3 with HTTP; Tue, 25 Jan 2011 15:30:58 -0700 (PDT)
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=d...
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:da...
Message-Id: <c8f49cec0807011530k11196ad4p7cb4b9420f2ae752@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_3927_12044027.1214951...
X-Spam-Status: score=3.7 tests=DNS_FROM_RFC_POST, HTML_00_10, HTML_MESSAGE, HTM...
X-Spam-Level: ***
Message Body: This is a KnowledgeBase article that provides information on how ...

我只想提取'To:'字段中包含的电子邮件地址,在上面的示例中user@example.com

我怎样才能做到这一点?

4

4 回答 4

2

您可以使用拆分功能。我喜欢使用数字表的版本,但有很多选择。首先,一个有 1,000,000 行的数字表:

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 1000000;

WITH n(rn) AS
(
    SELECT TOP (@UpperLimit) ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_columns AS s1, sys.all_objects ORDER BY s1.[object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers FROM n
WHERE rn <= @UpperLimit + 1;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

现在是一个通用的、内联的、表值的拆分函数,它将把你的分隔字符串变成一个集合:

CREATE FUNCTION dbo.SplitString
(
    @List NVARCHAR(MAX),
    @Delim VARCHAR(255)
)
RETURNS TABLE
AS
    RETURN ( SELECT [Value] FROM 
      ( 
        SELECT 
          [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
          CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
        FROM dbo.Numbers WHERE Number <= LEN(@List)
        AND SUBSTRING(@Delim + @List, [Number], LEN(@Delim)) = @Delim
      ) AS x
    );
GO

然后很简单:

DECLARE @x NVARCHAR(MAX) = N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...';

SELECT LTRIM(SUBSTRING(Value, 4, 4000)) 
  FROM dbo.SplitString(@x, CHAR(13)+CHAR(10))
  WHERE Value LIKE 'To: %@%';

数据在表格中?好的,没问题:

DECLARE @a TABLE(id INT, email NVARCHAR(MAX));

INSERT @a VALUES
(1,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...'),
(2,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: differentUser@somewhereelse.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...');

SELECT a.id, LTRIM(SUBSTRING(x.Value, 4, 4000))
FROM @a AS a
CROSS APPLY dbo.SplitString(a.email, CHAR(13)+CHAR(10)) AS x
WHERE x.Value LIKE 'To: %@%';

现在,您可能不得不使用分隔符 - 它可能只是 CHAR(10),或者只是 CHAR(13),或者它们可能是不同的顺序 - 不确定,并且无法从您的代码中判断它是什么...

于 2013-10-25T16:58:34.437 回答
1

您可以使用 XML 功能来拆分行并找到您需要的内容;

DECLARE @X XML

SELECT @X = CONVERT(XML, '<y><x>' + 
                REPLACE(REPLACE(value, '<', '&lt;'), CHAR(10), '</x><x>') + 
                 '</x></y>')
FROM test

SELECT [Value] = T.c.value('.','NVARCHAR(MAX)')
FROM @X.nodes('/y/x') T(c)
WHERE T.c.value('.','NVARCHAR(MAX)') LIKE 'To: %'

一个用于测试的 SQLfiddle

于 2013-10-25T17:03:22.957 回答
0

尝试这个:

select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))

这是一个完整的测试脚本:

declare @s varchar(500)

set @s = 'Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com'

select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13)+char(10), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))

请注意,根据规范RFC2822,在正确的电子邮件中,标题必须由 CRLF (char(13) + char(10))分隔,并且上面的代码做出了相同的假设。

如果您的电子邮件中有不同的行尾,您可能必须将每次出现的 更改char(13)+char(10)char(13)char(10)。如果您这样做,请记住还要调整+6to +5(因为它少了一个字符)。

于 2013-10-25T16:54:02.750 回答
0

如果电子邮件地址介于 first'To:'和之间'Return-Path:',您可以使用此(小提琴演示):

declare @s nvarchar(max) = 'From: Media Temple user (mt.kb.user@gmail.com)
                        Subject: article: How to Trace a Email
                        Date: January 25, 2011 3:30:58 PM PDT
                        To: user@example.com
                        Return-Path: <mt.kb.user@gmail.com>...'

select substring(@s, charindex('To:',@s)+3, 
             charindex('Return-Path:',@s)- charindex('To:',@s)-3)

--Results
user@example.com

更通用的版本:假设电子邮件地址在第一个返回路径之前

;with cte as (
 select reverse(left(@s, charindex('Return-Path:',@s)-1)) rs
)
select reverse(left(rs, charindex(':oT', rs)-1)) 
from cte

在表查询中,请替换@s 您的column name.

于 2013-10-25T16:54:10.163 回答