.net - 如何等待 Azure 搜索完成索引文档？用于集成测试目的

Question

设想

我正在构建一套自动化集成测试。每个测试在查询和验证预期结果之前将数据推送到 Azure 搜索索引。

问题

索引在服务中异步发生，并且在索引调用成功返回后数据不会立即可用。
测试在大多数时候执行当然太快了。

我试过的

我尝试查询文档，直到找到：

// Wait for the indexed document to become available
while (await _index.Documents.SearchAsync("DocumentId").Results.SingleOrDefault() == null) { }

但奇怪的是，后面的搜索查询通常不会找到任何东西：

// Works 10% of the time, even after the above loop
await _index.Documents.SearchAsync(query.Text);

使用任意暂停有效，但不能保证，我希望测试尽可能快地执行。

Thread.Sleep(3000);

Azure 搜索文档：

最后，上面示例中的代码延迟了两秒钟。索引在 Azure 搜索服务中异步发生，因此示例应用程序需要等待一小段时间以确保文档可用于搜索。像这样的延迟通常只在演示、测试和示例应用程序中是必需的。

没有不影响测试性能的解决方案吗？

score 3 · Accepted Answer

如果您的服务有多个搜索单元，则无法确定文档何时已被完全索引。这是一个深思熟虑的决定，有利于提高索引/查询性能而不是强一致性保证。

如果您正在针对单个单元搜索服务运行测试，则该方法（通过查询而不是查找继续检查文档是否存在）应该可以工作。

请注意，在免费层级搜索服务上，这将不起作用，因为它托管在多个共享资源上并且不计为一个单元。您会看到与使用专用多单元服务相同的短暂不一致

否则，一种可能的改进是使用重试和更短的睡眠时间。

score 1 · Accepted Answer

@HeatherNakama 的另一个回答非常有帮助。我想补充一点，但首先是一个意译的摘要：

没有办法可靠地知道文档已准备好在所有副本上进行搜索，因此等待找到文档的自旋锁唯一可行的方法是使用单副本搜索服务。（注意：免费层级搜索服务不是单副本，您对此无能为力。）

考虑到这一点，我创建了一个包含 Azure 搜索集成测试的示例存储库，大致如下所示：

private readonly ISearchIndexClient _searchIndexClient;

private void WaitForIndexing(string id)
{
    // For the free tier, or a service with multiple replicas, resort to this:
    // Thread.Sleep(2000);

    var wait = 25;

    while (wait <= 2000)
    {
        Thread.Sleep(wait);
        var result = fixture.SearchService.FilterForId(id);
        if (result.Result.Results.Count == 1) return;
        if (result.Result.Results.Count > 1) throw new Exception("Unexpected results");
        wait *= 2;
    }

    throw new Exception("Found nothing after waiting a while");
}

public async Task<DocumentSearchResult<PersonDto>> FilterForId(string id)
{
    if (string.IsNullOrWhiteSpace(id) || !Guid.TryParse(id, out var _))
    {
        throw new ArgumentException("Can only filter for guid-like strings", nameof(id));
    }

    var parameters = new SearchParameters
    {
        Top = 2, // We expect only one, but return max 2 so we can double check for errors
        Skip = 0,
        Facets = new string[] { },
        HighlightFields = new string[] { },
        Filter = $"id eq '{id}'",
        OrderBy = new[] { "search.score() desc", "registeredAtUtc desc" },
    };

    var result = await _searchIndexClient.Documents.SearchAsync<PersonDto>("*", parameters);

    if (result.Results.Count > 1)
    {
        throw new Exception($"Search filtering for id '{id}' unexpectedly returned more than 1 result. Are you sure you searched for an ID, and that it is unique?");
    }

    return result;
}

这可能会像这样使用：

[SerializePropertyNamesAsCamelCase]
public class PersonDto
{
    [Key] [IsFilterable] [IsSearchable]
    public string Id { get; set; } = Guid.NewGuid().ToString();

    [IsSortable] [IsSearchable]
    public string Email { get; set; }

    [IsSortable]
    public DateTimeOffset? RegisteredAtUtc { get; set; }
}

[Theory]
[InlineData(0)]
[InlineData(1)]
[InlineData(2)]
[InlineData(3)]
[InlineData(5)]
[InlineData(10)]
public async Task Can_index_and_then_find_person_many_times_in_a_row(int count)
{
    await fixture.SearchService.RecreateIndex();

    for (int i = 0; i < count; i++)
    {
        var guid = Guid.NewGuid().ToString().Replace("-", "");
        var dto = new PersonDto { Email = $"{guid}@example.org" };
        await fixture.SearchService.IndexAsync(dto);

        WaitForIndexing(dto);

        var searchResult = await fixture.SearchService.Search(dto.Id);

        Assert.Single(searchResult.Results, p => p.Document.Id == dto.Id);
    }
}

我已经测试并确认这在具有 1 个副本的基本层搜索服务上可靠地保持绿色，并且在免费层上间歇性地变为红色。

score 0 · Accepted Answer

如果仅测试需要等待，请使用 FluentWaitDriver 或类似组件在测试中等待。我不会用线程延迟污染应用程序。Azure 索引器将有一些可接受的毫秒级延迟，前提是您的搜索实例的性质。

.net - 如何等待 Azure 搜索完成索引文档？用于集成测试目的

设想

问题

我试过的

3 回答 3

Related

Reference