3

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-load-sql-data-warehouse。根据这个 1000 DWU 和 polybase 的链接,我应该得到 200MBps 的吞吐量。但我得到 4.66 MBps。我在 xlargerc 资源类中添加了用户,以实现 azure sql 数据仓库的最佳吞吐量。

下面是管道 JSON。

                         {
              "name": "UCBPipeline-Copy",
                 "properties": {
                   "description": "pipeline with copy activity",
                 "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "sink": {
                        "type": "SqlDWSink",
                        "allowPolyBase": true,
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "cloudDataMovementUnits": 4
                },
                "inputs": [
                    {
                        "name": "USBBlob_Concept
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureDW_Concept"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "AzureBlobtoSQLDW_Concept",
                "description": "Copy Activity"
            }
        ],
        "start": "2017-02-28T18:00:00Z",
        "end": "2017-03-01T19:00:00Z",
        "isPaused": false,
        "hubName": "sampledf1_hub",
        "pipelineMode": "Scheduled"
    }
}

输入数据集:

{
    "name": "AzureBlob_Concept",
    "properties": {
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "AzureZRSStorageLinkedService",
        "typeProperties": {
            "fileName": "conceptTab.txt",
            "folderPath": "source/",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": "\t"
            }
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

输出数据集:

{
    "name": "AzureDW_Concept",
    "properties": {
        "published": false,
        "type": "AzureSqlDWTable",
        "linkedServiceName": "AzureSqlDWLinkedService",
        "typeProperties": {
            "tableName": "concept"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

配置中是否缺少任何内容?

4

1 回答 1

2

我查看了 runId "e98ac557-a507-4a6e-8833-978eff1723c3",它应该属于您的 Copy Activity。从我们的服务日志来看,源文件不够大(在您的情况下为 270 MB),因此服务调用延迟会导致吞吐量不够好。您可以尝试加载更大的文件以获得更好的吞吐量。

于 2017-03-08T08:17:30.117 回答