2

当您发布到 Facebook 的链接时,它会抓取文章标题、描述和相关图像。大多数主要网站都有所需的 OG 标签,因此很容易获取此信息,但 FB 也能够处理没有它们的网站(您可以在这里尝试)。

显然,他们已经建立了一个系统,可以在没有 OG 标签的情况下获取这些信息。有谁知道有开源版本吗?

我认为它需要(按每个部分的优先顺序):

标题:

  1. 检查 og:title 标签。
  2. 检查常规元“标题”标签。
  3. 检查 h1 标签。

描述:

  1. 检查 og:description 标签。
  2. 检查常规元“描述标签”
  3. 检查具有足够内容的 div 或 p 标签以指示正文段落

图片:

  1. 检查 og:image 标签
  2. 检查超过一定尺寸(比如 100x100)的图像,并优先考虑那些首先出现的图像。

非常感谢!

4

1 回答 1

0

https://github.com/Anonyfox/node-htmlcarve

Node.js的htmlcarve模块完成了您所追求的大部分工作,以下是此页面生成的输出:

htmlcarve = require('htmlcarve');

htmlcarve.fromUrl('https://scotch.io/tutorials/using-mongoosejs-in-node-js-and-mongodb-applications', function(error, data) {
  console.log(JSON.stringify(data, null, 2));
});

这会产生:

{
  "source": {
    "html_meta": {
      "title": "Easily Develop Node.js and MongoDB Apps with Mongoose ⥠Scotch",
      "summary": "",
      "image": "/wp-content/themes/thirty/img/scotch-logo.png",
      "language": "en-US",
      "feed": "https://scotch.io/feed",
      "favicon": "https://scotch.io/wp-content/themes/thirty/img/icons/favicon-57.png",
      "author": "Chris Sevilleja"
    },
    "open_graph": {
      "title": "Easily Develop Node.js and MongoDB Apps with Mongoose",
      "summary": "",
      "image": "https://scotch.io/wp-content/uploads/2014/11/mongoosejs-node-mongodb-applications.png"
    },
    "twitter_card": {
      "title": "Easily Develop Node.js and MongoDB Apps with Mongoose",
      "summary": "",
      "author": "sevilayha"
    }
  },
  "result": {
    "title": "Easily Develop Node.js and MongoDB Apps with Mongoose",
    "summary": "",
    "image": "https://scotch.io/wp-content/uploads/2014/11/mongoosejs-node-mongodb-applications.png",
    "author": "sevilayha",
    "language": "en-US",
    "feed": "https://scotch.io/feed",
    "favicon": "https://scotch.io/wp-content/themes/thirty/img/icons/favicon-57.png"
  },
  "links": {
    "deep": "https://scotch.io/tutorials/using-mongoosejs-in-node-js-and-mongodb-applications",
    "shallow": "https://scotch.io/tutorials/using-mongoosejs-in-node-js-and-mongodb-applications",
    "base": "https://scotch.io"
  }
}

如果您已安装 Node.js,请使用

npm i -g htmlcarve

您可以直接从命令行运行它。

于 2015-02-25T12:43:14.970 回答