microsoft-cognitive - 带有 blob 文件的 FormRecognizer C# SDK - 不支持的媒体类型错误

Question

我正在尝试使用 Azure 认知服务的 C# FormRecognizer SDK。我将 pdf 存储在 Azure Blob 中，我需要使用 C# SDK 从这些 pdf 文件中提取文本/表格。

我看到“AnalyzeWithCustomModelAsync”方法将“Stream”作为输入参数，它只接受“FileStream”类型。如果我将“MemoryStream”作为输入参数类型传递，我会收到以下错误：

{"value":{"error":{"code":" UnsupportedMediaType ","message":"对于 HTML 表单数据，多部分请求必须包含媒体类型为 - 'application/pdf' 的文档， 'image/jpeg' 或 'image/png'. "}},"formatters":[],"contentTypes":[],"statusCode":415}

无论如何，我可以直接使用我的 blob 文件，而无需将这些文件保存在本地吗？

问候，马杜

score 1 · Accepted Answer

以下代码片段通过获取 blob 的实例（进入 CloudBlockBlob 类）然后将其加载到 MemoryStream 中来工作。一旦你有了它，你可以将它传递给表单识别器进行分析。

List<string> blobsToAnalyze = new List<string>();

// Get latest Form Recognizer training model ID
Guid aiTrainModelId = Guid.Empty;
ModelResult latestModel = await FormRecognizer.GetModelAsync(config, log);

if (latestModel != null)
    aiTrainModelId = latestModel.ModelId;

// Iterate through all blobs
foreach (string strBlob in blobsToAnalyze)
{
    CloudBlockBlob blob = blobContainer.GetBlockBlobReference(strBlob);

    using (MemoryStream ms = new MemoryStream())
    {
        // Load blob into a MemoryStream object
        await blob.DownloadToStreamAsync(ms);

        // Send to Form Recognizer to analyze
        AnalyzeResult results = await FormRecognizer.AnalyzeFormAsync(config, aiTrainModelId, ms, log);

        searchResults = FormRecognizer.AnalyzeResults(config, tableClient, results, log);
    }
}

microsoft-cognitive - 带有 blob 文件的 FormRecognizer C# SDK - 不支持的媒体类型错误

1 回答 1

Related

Reference