0

我正在使用Azure Data Factory将数据从REST API复制到Azure Data Lake Store。以下是我活动的 JSON

{
    "name": "CopyDataFromGraphAPI",
    "type": "Copy",
    "policy": {
        "timeout": "7.00:00:00",
        "retry": 0,
        "retryIntervalInSeconds": 30,
        "secureOutput": false
    },
    "typeProperties": {
        "source": {
            "type": "HttpSource",
            "httpRequestTimeout": "00:30:40"
        },
        "sink": {
            "type": "AzureDataLakeStoreSink"
        },
        "enableStaging": false,
        "cloudDataMovementUnits": 0,
        "translator": {
            "type": "TabularTranslator",
            "columnMappings": "id: id, name: name, email: email, administrator: administrator"
        }
    },
    "inputs": [
        {
            "referenceName": "MembersHttpFile",
            "type": "DatasetReference"
        }
    ],
    "outputs": [
        {
            "referenceName": "MembersDataLakeSink",
            "type": "DatasetReference"
        }
    ]
}

REST API 由我创建。首先出于测试目的,我只返回 2500 行,并且我的管道工作正常。它将数据从 REST API 调用复制到 Azure Data Lake Store。

测试后我更新了 REST API,现在它返回 125000 行。我在 REST 客户端中测试了该 API 并且工作正常。但在Azure Data Factory 的 Copy Activity中,将数据复制到 Azure Data Lake Store 时出现以下错误。

{
    "errorCode": "2200",
    "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedToReadHttpFile,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to read data from http source file.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (500) Internal Server Error.,Source=System,'",
    "failureType": "UserError",
    "target": "CopyDataFromGraphAPI"
}

接收端是 Azure Data Lake Store。我从 REST 调用复制到 Azure Data Lake Store 的内容大小是否有任何限制。

我还通过更新 REST API 调用(2500 行)重新测试了管道,它工作正常,当我更新 API 调用时,它返回 125000 行。我的管道开始给出与上述相同的错误。

我在复制活动中的源数据集是

{
    "name": "MembersHttpFile",
    "properties": {
        "linkedServiceName": {
            "referenceName": "WM_GBS_LinikedService",
            "type": "LinkedServiceReference"
        },
        "type": "HttpFile",
        "structure": [
            {
                "name": "id",
                "type": "String"
            },
            {
                "name": "name",
                "type": "String"
            },
            {
                "name": "email",
                "type": "String"
            },
            {
                "name": "administrator",
                "type": "Boolean"
            }
        ],
        "typeProperties": {
            "format": {
                "type": "JsonFormat",
                "filePattern": "arrayOfObjects",
                "jsonPathDefinition": {
                    "id": "$.['id']",
                    "name": "$.['name']",
                    "email": "$.['email']",
                    "administrator": "$.['administrator']"
                }
            },
            "relativeUrl": "api/workplace/members",
            "requestMethod": "Get"
        }
    }
}

接收器数据集是

{
    "name": "MembersDataLakeSink",
    "properties": {
        "linkedServiceName": {
            "referenceName": "DataLakeLinkService",
            "type": "LinkedServiceReference"
        },
        "type": "AzureDataLakeStoreFile",
        "structure": [
            {
                "name": "id",
                "type": "String"
            },
            {
                "name": "name",
                "type": "String"
            },
            {
                "name": "email",
                "type": "String"
            },
            {
                "name": "administrator",
                "type": "Boolean"
            }
        ],
        "typeProperties": {
            "format": {
                "type": "JsonFormat",
                "filePattern": "arrayOfObjects",
                "jsonPathDefinition": {
                    "id": "$.['id']",
                    "name": "$.['name']",
                    "email": "$.['email']",
                    "administrator": "$.['administrator']"
                }
            },
            "fileName": "WorkplaceMembers.json",
            "folderPath": "rawSources"
        }
    }
}
4

1 回答 1

0

据我所知,文件大小没有限制。我有一个包含数百万行的 10 GB csv,而数据湖并不关心。

我可以看到,虽然错误显示“接收器”端,但错误代码是 UserErrorFailedToReadHttpFile 所以我认为如果您更改源上的 httpRequestTimeout 问题可能会解决,截至目前它是“00:30:40”和也许行传输因此而被中断。30 分钟对于 2500 行来说是很多时间,但也许 125k 不适合那里。

希望这有帮助!

于 2018-04-04T18:47:46.817 回答