0

When sharing an article on facebook in a status, facebook generates a title, abstract and attach an image to the shared article.

For instance, putting www.stackoverflow.com in your status will geenrate

Stack Overflow https://stackoverflow.com/ This is a collaboratively edited question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

(which btw: is not in the source code of stackoverflow.com page)

But when trying something like an article in news website, we get some extracted results from source code of the page (check any article in www.goal.com for example) ..

Any idea about the algorithm facebook uses for that ?

4

1 回答 1

1

facebook 用来显示链接的元数据总是从 html 源代码中提取的。

正如@amit 所说,描述存在于源代码中,标题取自标题标签。如果您在调试器
中检查该 url,您可以看到 facebook 正在抱怨。 如果您单击页面上的最后一个链接(查看我们的爬虫对您的 URL 看到的确切内容),您可以看到 fb爬虫得到的响应。

此来源有时可能与您在浏览器中获得的不同(尽管在这种情况下不是),因为某些网站会检查用户代理字符串,如果它是 fb scraper ( facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)),则会返回不同的响应。

于 2012-06-04T20:54:35.487 回答