我得到了这个工作。文档不是很详细;这是详细信息。
这是我的 Open Graph 语言环境标签:
<meta property="og:locale" content="en_US" />
<meta property="og:locale:alternate" content="en_US" />
<meta property="og:locale:alternate" content="fr_CA" />
非常重要: 文档看起来og:locale
应该始终反映页面的“默认”区域设置。不是这种情况; 这样做会阻止爬虫检索其他语言。og_locale
必须反映页面的当前语言环境。换句话说,如果爬虫(或用户)请求fr_CA
内容,请确保在响应og_locale
中设置为。fr_CA
用 指定所有可能的语言环境og:locale:alternate
。这样,无论刮板要求en_US
还是fr_CA
,它仍然知道两者都存在。
这是我要求 Facebook 刮板重新处理我的页面:
curl -d "id=https://apps.facebook.com/everydaybarilla/&scrape=true" https://graph.facebook.com
这是回应:
{
"url": "http://apps.facebook.com/everydaybarilla/",
"type": "website",
"title": "Barilla\u2019s Every Day, Every Way Contest",
"locale": {
"locale": "en_us",
"alternate": [
"fr_ca"
]
},
"image": [
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/5.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/4.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/3.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/en-2.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/en-1.png"
}
],
"description": "Barilla Canada is whisking one lucky winner and a guest off to Italy on an 8-day Italian culinary adventure for 2 in the Barilla Every Day, Every Way Contest!",
"site_name": "Barilla\u2019s Every Day, Every Way Contest",
"updated_time": "2012-04-16T17:59:38+0000",
"id": "10150594698421968",
"application": {
"id": "317271281656427",
"name": "Barilla\u2019s Every Day, Every Way Contest",
"url": "http://www.facebook.com/apps/application.php?id=317271281656427"
}
}
刮板正确返回默认语言环境的数据,但根据文档,刮板似乎也应该刮掉备用语言环境;不是这种情况。从上面的响应中可以清楚地看出,它看到了备用语言环境,但它不处理它们。
所以,这是我专门要求 Facebook 抓取工具以法语方式处理我的页面:
curl -d "id=https://apps.facebook.com/everydaybarilla/&scrape=true&locale=fr_CA" https://graph.facebook.com
这一次,我正确地看到了从爬虫到我的服务器的两个请求。第二个请求的X-Facebook-Locale
标头和fb_locale
URL 参数都正确设置为fr_CA
. 并且 POST 正确返回法语响应:
{
"url": "http://apps.facebook.com/everydaybarilla/?fb_locale=fr_CA",
"type": "website",
"title": "Concours Tous les jours, de toutes les fa\u00e7ons de Barilla",
"locale": {
"locale": "fr_ca",
"alternate": [
"en_us",
"fr_ca"
]
},
"image": [
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/5.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/4.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/3.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/fr-2.png"
},
{
"url": "http://everydaybarilla.ssl.spidermarketing.ca/assets/img/thumbnails/fr-1.png"
}
],
"description": "Un heureux gagnant et son invit\u00e9(e) partiront \u00e0 destination de l\u2019Italie pour une aventure culinaire de 8 jours pour 2 personnes (valeur au d\u00e9tail approximative de 15 000 $)!",
"site_name": "Barilla\u2019s Every Day, Every Way Contest",
"updated_time": "2012-04-16T18:11:27+0000",
"id": "10150594698421968",
"application": {
"id": "317271281656427",
"name": "Barilla\u2019s Every Day, Every Way Contest",
"url": "http://www.facebook.com/apps/application.php?id=317271281656427"
}
}
成功!
当然,经过所有这些努力,当我访问法语 Facebook.com 并发布此 URL 时,状态框已填充...... 英文数据。似乎 Facebook 自己的界面未配置为请求正确的语言环境。
所以即使付出了所有这些努力,似乎什么也没有完成(通过 Facebook 翻译应用程序翻译我的字符串也不起作用,所以我想我不应该感到惊讶)。
不过,它确实回答了这个问题。也许其他人可以确定为什么 Facebook.com 界面似乎没有请求正确的语言环境。