0

我从事Bing News API v7 集成工作。更准确地说,我使用https://api.cognitive.microsoft.com/bing/v7.0/news/searchAPI 端点。

我发现了一些“意外”的分页行为。(预期的行为是每个页面都有恒定的大小)。

在此页面上解释了如何对结果进行分页

我遵循这种方法。我使用 30 作为页面大小;因此,偏移量的值为 0、30、60 等。

例如,使用这些参数时:查询“Java 14”、市场“en-US”、按日期排序,偏移量的值为 0、30、60、90、120、150 ( /bing/v7.0/news/search?q=Java 14&count=30&offset=0&mkt=en-US&sortBy=date)。

我得到六页结果,每页包含少于 30 个 URL。

Page: 0 Total: 27 results
Page: 1 Total: 26 results
Page: 2 Total: 26 results
Page: 3 Total: 29 results
Page: 4 Total: 29 results
Page: 5 Total: 7 results
...

这个 Stackoverflow深度分页时 Bing Search API v5 的预期行为是什么?与 Bing API v5 相关。分页值不遵循固定大小的顺序,但公式是previous result size + 1.

所以,我的问题是:我应该使用哪些值作为第二页 ( Page: 1) 的偏移量?是28还是30?第三页 ( Page 2) 的值是 54 还是 60?

4

1 回答 1

1

对 api 进行第一次传递以确定 totalEstimatedMatches。除以 totalEstimatedMatches / 25 或每个页面的大小以获得要进行的 api 调用数。例如,如果 totalEstimatedMatches = 100 则进行 4 次 api 调用,每个调用应返回 25 个 url。我谨慎行事并将其减少 1,但您可以将其放在 try catch 中。本例中的 s.Count 将是 25。VB.Net 中的解决方案,但您明白了。

        'the secret key 
        Dim accessKey As String = "xxxxxxxxxxxxxxxxxxxxxxxxx"
        Dim endpoint As String = "https://api.cognitive.microsoft.com/bing/v7.0/news/search?"

        Dim queryString = HttpUtility.ParseQueryString(String.Empty)
        queryString("q") = search_criteria 'Uri.EscapeDataString(search_criteria)
        queryString("mkt") = market
        queryString("count") = "25"
        queryString("offset") = "0"
        queryString("freshness") = freshness
        queryString("SafeSearch") = "strict"

        ' Construct the URI of the search request
        uriQuery = endpoint & queryString.ToString

        ' Perform the Web request and get the response
        request = HttpWebRequest.Create(uriQuery)
        request.Headers.Add("Ocp-Apim-Subscription-Key", accessKey)

        response = CType(request.GetResponseAsync.Result, HttpWebResponse)
        json = (New StreamReader(response.GetResponseStream)).ReadToEnd

        'create json object
        Dim converter = New ExpandoObjectConverter()
        Dim message As Object = JsonConvert.DeserializeObject(Of ExpandoObject)(json, converter)

        'get top level object and its sub objects
        s = message.value

        Try
            totalEstimatedMatches = CInt(message.totalEstimatedMatches)
            total_available_for_processing = s.Count
        Catch ex As Exception
        End Try

        'get total number of pages availble at 25 records per page, so we page thru 25 records at a time and then call api
        Dim page_count As Integer = totalEstimatedMatches / 25

        'loop thru page_count and 
        For p As Integer = 0 To page_count - 1

            If p = 0 Then
                queryString("count") = "25"
                queryString("offset") = "0"
            Else
                'determine offset
                queryString("count") = "25"
                queryString("offset") = p * 25
            End If

            ' Construct the URI of the search request
            uriQuery = endpoint & queryString.ToString

            ' Perform the Web request and get the response
            request = HttpWebRequest.Create(uriQuery)
            request.Headers.Add("Ocp-Apim-Subscription-Key", accessKey)

            response = CType(request.GetResponseAsync.Result, HttpWebResponse)
            json = (New StreamReader(response.GetResponseStream)).ReadToEnd

            'create json object
            message = JsonConvert.DeserializeObject(Of ExpandoObject)(json, converter)

            'get top level object and its sub objects
            s = message.value

            For i As Integer = 0 To s.Count - 1

                Dim myuri As Uri = New Uri(s(i).url.ToString)
                Dim vendor_domain As String = myuri.Host

                System.Diagnostics.Debug.WriteLine(icount & "," & myuri.ToString & "," & vendor_domain)
                icount = icount + 1
            Next
            System.Threading.Thread.Sleep(100)

        Next
于 2020-11-06T17:56:24.483 回答