5

我正在按照使用 Stream Analytics 从 Application Insights 导出到 SQL的示例演练进行操作。我正在尝试导出自定义事件维度(下面的 JSON 示例中的 context.custom.dimensions),这些维度被添加为数据文件中的嵌套 JSON 数组。如何展平 context.custom.dimensions 中的维度数组以导出到 SQL?

JSON...

{
  "event": [
    {
      "name": "50_DistanceSelect",
      "count": 1
    }
  ],
  "internal": {
    "data": {
      "id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
      "documentVersion": "1.5"
    }
  },
  "context": {
    "device": {
      "os": "Windows",
      "osVersion": "Windows 8.1",
      "type": "PC",
      "browser": "Chrome",
      "browserVersion": "Chrome 43.0",
      "screenResolution": {
        "value": "1920X1080"
      },
      "locale": "unknown",
      "id": "browser",
      "userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    },
    "application": {},
    "location": {
      "continent": "North America",
      "country": "United States",
      "point": {
        "lat": 38.0,
        "lon": -97.0
      },
      "clientip": "0.115.6.185",
      "province": "",
      "city": ""
    },
    "data": {
      "isSynthetic": false,
      "eventTime": "2015-07-15T23:43:27.595Z",
      "samplingRate": 0.0
    },
    "operation": {
      "id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
    },
    "user": {
      "anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
      "anonAcquisitionDate": "0001-01-01T00:00:00Z",
      "authAcquisitionDate": "0001-01-01T00:00:00Z",
      "accountAcquisitionDate": "0001-01-01T00:00:00Z"
    },
    "custom": {
      "dimensions": [
        {
          "CategoryAction": "click"
        },
        {
          "SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
        },
        {
          "WebsiteVersion": "3.0"
        },
        {
          "PageSection": "FilterFind"
        },
        {
          "Category": "EventCategory1"
        },
        {
          "Page": "/page-in-question"
        }
      ],
      "metrics": []
    },
    "session": {
      "id": "062703E5-5E15-491A-AC75-2FE54EF03623",
      "isFirst": false
    }
  }
}
4

5 回答 5

6

一个稍微动态的解决方案是设置一个临时表:

WITH ATable AS (
SELECT
     temp.internal.data.id as ID
    ,dimensions.ArrayValue.CategoryAction as CategoryAction
    ,dimensions.ArrayValue.SessionId as SessionId 
    ,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion 
    ,dimensions.ArrayValue.PageSection as PageSection 
    ,dimensions.ArrayValue.Category as Category 
    ,dimensions.ArrayValue.Page as Page  
FROM [analyticseventinputs] temp 
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)

然后根据唯一键进行连接

FROM [analyticseventinputs] Input 
Left JOIN ATable CategoryAction on 
    Input.internal.data.id = CategoryAction.ID AND
    CategoryAction.CategoryAction <> "" AND
     DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5 

相当烦人的一点是对 datediff 的要求,因为连接旨在组合 2 个数据流,但在这种情况下,您只是在唯一键上连接。所以我将它设置为 5 天的大值。与其他解决方案相比,这实际上只能防止自定义参数未按顺序排列。

于 2015-07-27T16:47:51.090 回答
5

大多数在线教程使用 CROSS APPLY 或 OUTER APPLY 但这不是您要寻找的,因为它会将每个属性放在不同的行上。为了解决这个问题,请使用以下函数:GetRecordPropertyValue 和 GetArrayElement,如下所示。这会将属性扁平化为一行。

SELECT
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
  [outputstream]
FROM
  [inputstream] MySource
于 2015-12-17T18:36:35.640 回答
2

你在 SQL 中有什么模式?您是否希望在 SQL 中使用所有维度作为列的单行?

这在今天可能是不可能的。但是在 7 月 30 日之后,Azure 流分析中将会有更多的 Array/Record 函数。

然后,您将能够执行以下操作:

SELECT 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 0
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
        ELSE ''
        END AS CategoryAction 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 1
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
        ELSE ''
        END AS WebsiteVersion 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 2
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
        ELSE ''
        END AS PageSection
FROM input

如果您希望每个维度有单独的行,则可以使用 CROSS APPLY 运算符。

于 2015-07-22T21:56:45.920 回答
1

Alex Raizman 提出的一种非常方便的方法是对要展平的字段进行一些聚合,按剩余的选择列表分组,假设

  • 您知道维度中可能的对象集,并且
  • 您在此数组中没有重复的对象,并且
  • 有些东西你可以唯一地识别你的初始行(比如 id )

    SELECT
      CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
      NVARCHAR(MAX))),
      SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
      NVARCHAR(MAX))),
      WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
      NVARCHAR(MAX))),
      PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
      NVARCHAR(MAX))),
      Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
      NVARCHAR(MAX))),    
      Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) 
    INTO  
      [outputstream] 
    FROM [inputstream] MySource 
    CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d 
    GROUP BY System.Timestamp, MySource.id
    

我们还System.Timestamp按照流分析的预期分组创建一个时间窗口,以执行基于集合的操作,如计数或聚合。

于 2018-09-27T09:16:33.557 回答
0

虽然问题很老。但这就是如何实现自定义尺寸的单行。随着定制尺寸数量的增加,它会变得丑陋。

    SELECT
    A.internal.data.id,        
    eventFlat.ArrayValue.name as eventName,
    A.context.operation.name as operation,
    A.context.data.eventTime,
    a1.company,
    a2.userId,
    a3.feature,        
    A.context.device,    
    A.context.location         
FROM [YourInputAlias] A   
OUTER APPLY GetArrayElements(A.event) eventFlat  
LEFT JOIN (
        SELECT 
        A1.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.company
      FROM [YourInputAlias] A1  
      OUTER APPLY GetArrayElements(A1.context.custom.dimensions) customDimensionsFlat   
      where  customDimensionsFlat.ArrayValue.company IS NOT NULL
      ) a1 ON a.internal.data.id = a1.id AND datediff(day, a, a1) between 0 and 5
LEFT JOIN (
        SELECT 
        A2.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.userid     
      FROM [YourInputAlias] A2  
      OUTER APPLY GetArrayElements(A2.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.userid  IS NOT NULL
      ) a2 ON a.internal.data.id = a2.id AND datediff(day, a, a2) between 0 and 5
LEFT JOIN (
        SELECT 
        A3.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.feature     
      FROM [YourInputAlias] A3
      OUTER APPLY GetArrayElements(A3.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.feature  IS NOT NULL
      ) a3 ON a.internal.data.id = a3.id AND datediff(day, a, a3) between 0 and 5
于 2020-05-20T03:26:33.597 回答