1

目前我正在使用一个提供程序构建的 SQL Server DB。该数据库具有来自通过其系统进行的调用的数据。存储数据的主表有 7 个字段。1 个字段是主键,然后是 2 个外键,几个数据时间戳,最后是一个名为“SergmentLog”的大量字段

在该字段中,数据是非结构化的。以下是数据的示例:

/20160219T154710.554-07/0?S=50&E=3512&CUTC=20160219T155235.662-07&1=100187177120160219&2=0&3=18823&4=user%20queue:icadmin&5=&6=Interact&7=|/20160219T154729.377-07/0?S=50&E=3504&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=81592&4=user%20queue:icadmin&5=&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20/%3e%0a&8=&9=2|/20160219T154850.970-07/0?S=50&E=3502&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=55&4=&5=workgroup%20queue:Central%20Ops%202&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20TransferredUser%3d%22Phoenix%20AZ%22%20/%3e%0a|/20160219T154851.025-07/0?S=50&E=3500&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=1048&4=&5=&6=Queue&7=%3cDetails%20IVRAppName%3d%22Central%20Ops%202%22%20/%3e%0a|/20160219T154852.073-07/0?S=50&E=3502&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=13344&4=&5=workgroup%20queue:Central%20Ops%202&6=Interact&7=|/20160219T154905.417-07/0?S=50&E=3504&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=26202&4=user%20queue:icadmin&5=workgroup%20queue:Central%20Ops%202&6=LocalDisconnect&7=&8=&9=5

我被告知的是,每个“SegmentLog”可以有多个“事件”,在 SegmentLog 字段中称为“E=”。每个事件都由“|”分隔 管道符号。但在每个偶数之前,有一个来自服务器的数据时间戳,然后是 SourceID(称为“S=”),最后是 EventID(称为“E=”)

在每个 EventID(编号从 3500 - 3512)之后会有编号从 1 到 9 的属性(称为“1=”、“2=”等)。

请记住,每个 SegmentLog 可能有多个具有相同 EventID 的事件,并且并非所有属性都会显示在每个 EventID 中(IE E=3502 可能仅显示属性 1-6,而 E=3503 可能显示属性 1-9)将这些数据结构化为表结构的最佳方式。我可用的工具是在视图或中间 SSIS 知识中构建复杂的搜索查询。

编辑

我希望看到数据变成这样。但包括所有属性:

DateTime                    Sequence  EventID  Attr1                  Attr3  
--------                    --------  -------  -----                  -----
/20160219T154710.554-07/0?  s=50      &E=3512  &1=100187177120160219  &3=18823
/20160219T154729.377-07/0?  S=50      &E=3504  &1=100187177120160219  &3=81592
/20160219T154850.970-07/0?  S=50      &E=3502  &1=100187177120160219  &3=55
/20160219T154851.025-07/0?  S=50      &E=3500  &1=100187177120160219  &3=1048
4

1 回答 1

0

好的,我认为这就是您要完成的工作。

为了测试这一点,我将您的示例行添加到 SQL Server 表 nvarchar(max) 列中:

if exists (select * from sysobjects where name='BigLongString' and xtype='U')
drop table dbo.BigLongString;
go

create table dbo.BigLongString
( 
 SegmentLog nvarchar(max)
);
go

insert into dbo.BigLongString (SegmentLog)
values ('/20160219T154710.554-07/0?S=50&E=3512&CUTC=20160219T155235.662-07&1=100187177120160219&2=0&3=18823&4=user%20queue:icadmin&5=&6=Interact&7=|/20160219T154729.377-07/0?S=50&E=3504&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=81592&4=user%20queue:icadmin&5=&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20/%3e%0a&8=&9=2|/20160219T154850.970-07/0?S=50&E=3502&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=55&4=&5=workgroup%20queue:Central%20Ops%202&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20TransferredUser%3d%22Phoenix%20AZ%22%20/%3e%0a|/20160219T154851.025-07/0?S=50&E=3500&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=1048&4=&5=&6=Queue&7=%3cDetails%20IVRAppName%3d%22Central%20Ops%202%22%20/%3e%0a|/20160219T154852.073-07/0?S=50&E=3502&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=13344&4=&5=workgroup%20queue:Central%20Ops%202&6=Interact&7=|/20160219T154905.417-07/0?S=50&E=3504&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=26202&4=user%20queue:icadmin&5=workgroup%20queue:Central%20Ops%202&6=LocalDisconnect&7=&8=&9=5')
go

然后我创建了一个 SSIS 包来提取这些数据并对其进行解析。数据流任务如下所示: OLE DB Source 组件中的 SQL 语句为:数据流任务

select 
      SegmentLog 
from 
      dbo.BigLongString;

脚本组件是一种转换,具有异步输出:

输入和输出表

如果您展开 Output 0 树,您可以看到添加的所有列。Attr* 列都是 dt_wstr 500。我不确定这些可以有多大,所以您可能想要更改数据类型。其余的列我只是将 dt_wstr 设为 50:

输出列

这是脚本组件的代码。确保在退出之前构建:

 #region Namespaces
 using System;
 using System.Data;
 using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
 using Microsoft.SqlServer.Dts.Runtime.Wrapper;
 using Microsoft.SqlServer.Dts.Pipeline;
 #endregion

 [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
 public class ScriptMain : UserComponent
 {
   private PipelineBuffer inputBuffer;

 public override void Input0_ProcessInputRow(Input0Buffer Row)
 {

    //length of blob
    int blobLen = 0;
    //the bytes of the blob
    byte[] webBlob = null;

    string webStr = null;

    string[] dateSplit = new string[] { "|" };

    //get blob length. Hardcoded to 0 since we only look at one column
    //in this example
    blobLen = (int)inputBuffer.GetBlobLength(0);

    //gets string from blob, hardcoded columnindex since we only have 1 column
    webStr = ConvertBlobToString((byte[])inputBuffer.GetBlobData(0, 0, blobLen));

    //holds value for dates in string
    string[] dates = webStr.Split(dateSplit, StringSplitOptions.None);

    //Loop through each date
    foreach (string date in dates)
    {
        //Parse out each attribute for a given date
        string[] attributes = date.Split('&');

        Output0Buffer.AddRow();

        //Loop through each attribute in date, you can remove the "&"+ if you do not need these in the values
        for (int i = 0; i < attributes.Length; i++)
        {

            switch (i)
            {
                case 0:
                    Output0Buffer.DateTime = attributes[i].Substring(0, attributes[i].IndexOf('S'));
                    Output0Buffer.Sequence = attributes[i].Substring(attributes[i].IndexOf('S'), attributes[i].Length - attributes[i].IndexOf('S'));
                    break;
                case 1:
                    Output0Buffer.EventID = "&" + attributes[i];
                    break;
                case 2:
                    Output0Buffer.CUTC = "&" + attributes[i];
                    break;
                case 3:
                    Output0Buffer.Attr1 = "&" + attributes[i];
                    break;
                case 4:
                    Output0Buffer.Attr2 = "&" + attributes[i];
                    break;
                case 5:
                    Output0Buffer.Attr3 = "&" + attributes[i];
                    break;
                case 6:
                    Output0Buffer.Attr4 = "&" + attributes[i];
                    break;
                case 7:
                    Output0Buffer.Attr5 = "&" + attributes[i];
                    break;
                case 8:
                    Output0Buffer.Attr6 = "&" + attributes[i];
                    break;
                case 9:
                    Output0Buffer.Attr7 = "&" + attributes[i];
                    break;
                case 10:
                    Output0Buffer.Attr8 = "&" + attributes[i];
                    break;
                case 11:
                    Output0Buffer.Attr9 = "&" + attributes[i];
                    break;
            }
        }

    }
}

public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer)
{
    inputBuffer = Buffer;
    base.ProcessInput(InputID, Buffer);
}

public string ConvertBlobToString(byte[] webBlob)
{
    //string to return
    string webStr = null;

    //get string from blob
    webStr = System.Text.Encoding.Unicode.GetString(webBlob);

    return webStr;

}

}

运行包,您应该会在数据查看器中看到按预期解析出的数据:

数据查看器

于 2016-02-26T02:15:40.187 回答