8

我的页面的 HTML 结构如下所示。我已经添加了所有元 og 标签,但 facebook 仍然无法从我的网站上抓取任何信息。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
            <meta http-equiv="Content-Type" content="text/html;" charset=utf-8"></meta>
            <title>My Site</title>
            <meta content="This is my title" property="og:title">
            <meta content="This is my description" property="og:description">
            <meta content="http://ia.media-imdb.com/images/rock.jpg" property="og:image">
            <meta content="<MYPAGEID>" property="fb:page_id">
            .......
    </head>
    <body>
    .....

当我在 facebook 调试器(https://developers.facebook.com/tools/debug)中输入 URL 时,我收到以下消息:

Scrape Information
Response Code   404

Critical Errors That Must Be Fixed
Bad Response Code   URL returned a bad HTTP response code.


Errors that must be fixed

Missing Required Property   The 'og:url' property is required, but not present.
Missing Required Property   The 'og:type' property is required, but not present.
Missing Required Property   The 'og:title' property is required, but not present.


Open Graph Warnings That Should Be Fixed
Inferred Property   The 'og:url' property should be explicitly provided, even if a    value can be inferred from other tags.
Inferred Property   The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags.

为什么 facebook 不读取元标签信息?该页面是可访问的,而不是隐藏在登录等后面。

更新

好的,我做了一些调试,这就是我发现的。我在我的目录中设置了 htaccess 规则-我正在使用 PHP Codeigniter 框架并有 htaccess 规则从 url 中删除 index.php。

因此,当我将 url 提供给没有 index.php 的 facebook 调试器(https://developers.facebook.com/tools/debug)时,facebook 显示 404,但是当我使用 index.php 提供 url 时,它能够解析我的页面。

现在,当 url 没有 index.php 时,如何让 facebook 抓取内容?

这是我的 htaccess 规则:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /

    #Removes access to the system folder by users.
    #Additionally this will allow you to create a System.php controller,
    #previously this would not have been possible.
    #'system' can be replaced if you have renamed your system folder.
    RewriteCond %{REQUEST_URI} ^system.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #When your application folder isn't in the system folder
    #This snippet prevents user access to the application folder
    #Submitted by: Fabdrol
    #Rename 'application' to your applications folder name.
    RewriteCond %{REQUEST_URI} ^application.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #Checks to see if the user is attempting to access a valid file,
    #such as an image or css document, if this isn't true it sends the
    #request to index.php
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?/$1 [L]
</IfModule>

<IfModule !mod_rewrite.c>
    # If we don't have mod_rewrite installed, all 404's
    # can be sent to index.php, and everything works as normal.
    # Submitted by: ElliotHaughin

    ErrorDocument 404 /index.php
</IfModule>
4

2 回答 2

10

Facebook 文档包含有关开放图形协议的详细信息以及如何包含正确的元标记,以便 Facebook 可以准确地抓取您的 URL。

https://developers.facebook.com/docs/opengraphprotocol/

本质上,您想要做的是og:tags在您现有的元标记中包含一些特殊的替代(或附加)。

  <head>
    <title>Ninja Site</title>
    <meta property="og:title" content="The Ninja"/>
    <meta property="og:type" content="movie"/>
    <meta property="og:url" content="http://www.nin.ja"/>
    <meta property="og:image" content="http://nin.ja/ninja.jpg"/>
    <meta property="og:site_name" content="Ninja"/>
    <meta property="fb:admins" content="USER_ID"/>
    <meta property="og:description"
          content="Superhuman or supernatural powers were often
                   associated with the ninja. Some legends include
                   flight, invisibility and shapeshifting..."/>
    ...
  </head>

如果您有一个.htaccess文件重定向内容并使 Facebook 难以抓取您的 URL,您可能能够通过检测 Facebook 的爬虫.htaccess并为其提供正确的标签而侥幸逃脱。我相信 Facebook 爬虫提供的用户代理是这样的:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

该文档还有一节讨论确保他们的爬虫可以访问您的站点

根据您的配置,您可以通过查看您的服务器 access_log 来测试它。在运行 apache 的 UNIX 系统上,访问日志位于/var/log/httpd/access_log.

因此,您可以在.htaccess文件中使用与此类似的条目 -

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit
RewriteRule ^(.*)$ ogtags.php?$1 [L,QSA]

我放在那里的[L,QSA]标志表明这是将在当前请求 ( ) 上强制执行的最后一条规则,并且(Query String Append ) 表明在重写 URL 时将传递给定的任何查询字符串。例如,一个 URL,例如:LQSA

https://example.com/?id=foo&action=bar

会传到ogtags.php这样- ogtags.php?id=foo&action=bar。您的ogtags.php文件将根据传递的参数生成动态 og:meta 标签。

现在,只要您的.htaccess文件检测到 Facebook 用户代理,它就会将ogtags.php文件传递给他(可以包含正确的 og:meta 信息)。请注意您的任何其他规则.htaccess以及它们可能如何影响新规则。

.htaccess您详细介绍的条目中,我建议将这条新的“Facebook 规则”作为第一条规则。

于 2012-04-10T21:32:00.650 回答
1

我遇到了同样的问题,即:错误的响应代码:URL 返回了错误的 HTTP 响应代码。

但奇怪的是,这就是解决它的方法:我添加了

    <meta property="og:locale" content="en_US" />

到我的网站 HEAD 标签,它工作。

Also, not to forget, in your application dashboard (where you get your APP ID) you must have atleast "Website with Facebook Login" enabled and enter the URL of the website. Otherwise it won't work...regardless if you are not using any Facebook Logins on your site.

于 2012-12-31T08:31:08.093 回答