3

我有一个包含大约 4,000 个文档的 DocumentDB 实例。我刚刚将 Azure 搜索配置为对其进行搜索和索引。这起初工作得很好。昨天我更新了文档和索引字段以及一个 UDF 来索引一个复杂的字段。现在索引器报告 DocumentDB 报告 RequestRateTooLargeException。有关该错误的文档建议限制调用,但似乎搜索需要这样做。有解决方法吗?

4

1 回答 1

1

Azure Search code uses DocumentDb client SDK, which retries internally with the appropriate timeout when it encounters RequestRateTooLarge error. However, this only works if there're no other clients using the same DocumentDb collection concurrently. Check if you have other concurrent users of the collection; if so, consider adding capacity to the collection.

This could also happen because, due to some other issue with the data, DocumentDb indexer isn't able to make forward progress - then it will retry on the same data and may potentially encounter the same data problem again, akin a poison message. If you observe that a specific document (or a small number of documents) cause indexing problem, you can choose to ignore them. I'm pasting an excerpt from the documentation we're about to publish:

Tolerating occasional indexing failures

By default, an Azure Search indexer stops indexing as soon as even as single document fails to be indexed. Depending on your scenario, you can choose to tolerate some failures (for example, if you repeatedly re-index your entire datasource). Azure Search provides two indexer parameters to fine- tune this behavior:

  • maxFailedItems: The number of items that can fail indexing before an indexer execution is considered as failure. Default is 0.
  • maxFailedItemsPerBatch: The number of items that can fail indexing in a single batch before an indexer execution is considered as failure. Default is 0.

You can change these values at any time by specifying one or both of these parameters when creating or updating your indexer:

PUT https://service.search.windows.net/indexers/myindexer?api-version=[api-version]
Content-Type: application/json
api-key: [admin key]
    {
        "dataSourceName" : "mydatasource",
        "targetIndexName" : "myindex",
        "parameters" : { "maxFailedItems" : 10, "maxFailedItemsPerBatch" : 5 }
    }

Even if you choose to tolerate some failures, information about which documents failed is returned by the Get Indexer Status API.

于 2015-03-23T03:21:14.187 回答