为了处理大量数据,我们希望将 Azure Data Lake Gen2 存储与 Azure Batch 结合使用。这是我尝试过的:
我在 Data Lake 文件系统中创建了 Pool、Job 和 Uploaded File (参考取自Microsoft Docs)。 当批处理任务尝试从数据湖文件系统下载资源文件时失败。这是代码:
var poolId = Guid.NewGuid().ToString(); //using poolId for fileSystem, pool, and job
var sharedKeyCredential = new StorageSharedKeyCredential(storageAccountName, storageAccountKey);
string dfsUri = "https://" + storageAccountName + ".dfs.core.windows.net";
DataLakeServiceClient DataLakeServiceClient = new DataLakeServiceClient(new Uri(dfsUri), sharedKeyCredential);
//Create File System
await DataLakeServiceClient.CreateFileSystemAsync(poolId);
//Create Directory
DataLakeFileSystemClient fileSystemClient = DataLakeServiceClient.GetFileSystemClient(poolId);
await fileSystemClient.CreateDirectoryAsync("my-directory");
//Upload File To FileSystem
DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient("my-directory");
DataLakeFileClient fileClient = directoryClient.GetFileClient(fileName);
await fileClient.UploadAsync(filePath);
//Pool, Job Created (keeping JobId = poolId), Now adding task to the Job
using (var batchClient = BatchClient.Open(new BatchTokenCredentials(batchAccountUrl, tokenProvider)))
{
var inputFile = ResourceFile.FromUrl(fileClient.Uri.AbsoluteUri, fileName);
var task = new CloudTask(TaskId, CommandLine)
{
UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin, scope: AutoUserScope.Task)),
ResourceFiles = new List<ResourceFile> { inputFile }, //Add resource file
OutputFiles = CreateOutputFiles(batchStorageAccount, poolId) //any *.txt file
};
batchClient.JobOperations.AddTask(poolId, task);
}
添加任务后,我收到ResourceContainerAccessDenied
错误 - 这意味着上传到存储的文件,BatchService 任务无权访问该文件。
当我尝试使用存储容器时,批处理服务按预期运行。对于 StorageContainers,身份验证是使用 SAS 令牌完成的。但在这种情况下,我无法弄清楚如何使用 SAS 令牌或如何验证 BatchService 的存储以访问节点中的资源文件。
Data Lake Gen2 文件系统的任何其他替代方案也可能会有所帮助。