facebook-opengraph - Facebook 爬虫 cURL：AWS CLoudFront 上的 28 (OPERATION_TIMEOUTED)

Question

我在 AWS 云端部署了我的网站，一切都使用 https，使用邮递员我有 66 毫秒的时间下载文件，但 Facebook 调试器显示以下错误：

Curl Timeout The request to scrape the URL timed out.
Error de cURL Error de cURL: 28 (OPERATION_TIMEOUTED)

如此处所述使用 curl：https ://developers.facebook.com/docs/sharing/webmasters/crawler/结果是：

curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://ubiqq.com/IngenierosCHILE/dia-de-la-ingenieria-2020"



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 99.84.191.88...
* Connected to ubiqq.com (99.84.191.88) port 443 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 594 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: ubiqq.com (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=ubiqq.com
*        start date: Fri, 01 May 2020 00:00:00 GMT
*        expire date: Tue, 01 Jun 2021 12:00:00 GMT
*        issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
*        compression: NULL
* ALPN, server accepted to use http/1.1
> GET /IngenierosCHILE/Educacion-en-ingenieria-en-tiempos-de-pandemia HTTP/1.1
> Host: ubiqq.com
> User-Agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
> Accept: */*
> Accept-Encoding: deflate, gzip
> Range: bytes=0-524288
> Connection: close
>
< HTTP/1.1 200 OK
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: close
< Date: Sat, 23 May 2020 18:35:29 GMT
< x-amzn-RequestId: 8d6763e8-755d-4c25-8a03-dae704aebac1
< Access-Control-Allow-Origin: *
< x-amz-apigw-id: M_3yuF5pIAMF2HA=
< Cache-Control: max-age = 86300
< X-Amzn-Trace-Id: Root=1-5ec96cde-2d079cbf298be05bf81fc01e;Sampled=0
< Access-Control-Allow-Credentials: false
< Via: 1.1 2ad0cde89ab58d454177893ae4447f50.cloudfront.net (CloudFront), 1.1 9742923607374c982a5b7e9258144eab.cloudfront.net (CloudFront)
< X-Amz-Cf-Pop: IAD89-C1
< Content-Encoding: gzip
< Vary: Accept-Encoding
< X-Cache: Hit from cloudfront
< X-Amz-Cf-Pop: IAD89-C2
< X-Amz-Cf-Id: QKbU0J_IgXlTdcGG4lMV7KftU2Y3TsdC1UQi7azGXMhiaAzDp_WfLA==
< Age: 52
<
{ [16360 bytes data]
100  223k    0  223k    0     0  2336k      0 --:--:-- --:--:-- --:--:-- 2351k
* Closing connection 0

我不知道如何解决：/

编辑：

我发现了错误，这很棘手。

我实现了一个服务器端渲染，渲染时间超过 10 秒，是 cloudfront 的起源。我创建了一个脚本来进行第一次调用，这样第一次调用需要 10 秒以上并存储在云端，然后以下调用位于边缘并且需要不到 100 毫秒的时间来服务。

问题是 Facebook 的爬虫在另一个边缘命中云端，而该边缘没有缓存上的数据，而是去源头获取它，而不是从我第一次调用的边缘获取它，因为该源需要超过 10 秒，爬虫中止，因为它等待长达 10 秒。

为了解决这个问题，我必须创建一个耗时不到 10 秒的 SSR，或者调用所有边缘，试图找到哪个 facebook 的爬虫命中。

facebook-opengraph - Facebook 爬虫 cURL：AWS CLoudFront 上的 28 (OPERATION_TIMEOUTED)

0 回答 0

Related

Reference