1

我正在尝试让Google Sitemap Generator正常工作。

这是我的(Zend Framework 2)项目结构

/
/...
/public/...
/public/sitemap.xml
/public/urllist.txt
/...
/temp/googlesitemapgen/
/temp/googlesitemapgen/config.xml
/temp/googlesitemapgen/sitemap_gen.py
/...

配置文件

<?xml version="1.0" encoding="UTF-8" ?>
<site
    base_url="http://foo.bar.loc"
    store_into="/var/www/bar/foo/public/sitemap.xml"
    verbose="3"
    suppress_search_engine_notify="0"
>
    <urllist path="/var/www/bar/foo/public/urllist.txt" encoding="UTF-8" />
</site>

urllist.txt

http://foo.bar.loc

当我调用生成脚本时

user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xmlthon sitemap_gen.py --config=config.xml

发生错误:

user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xml 
sitemap_gen.py:65: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
Reading configuration file: config.xml
BaseURL is set to: http://foo.bar.loc/
Input: From URLLIST "/var/www/bar/foo/public/urllist.txt"
Opened URLLIST file: /var/www/bar/foo/public/urllist.txt
[WARNING] Discarded URL for not starting with the base_url: http://foo.bar.loc
[WARNING] No URLs were recorded, writing an empty sitemap.
Sorting and normalizing collected URLs.
Writing Sitemap file "/var/www/bar/foo/public/sitemap.xml" with 0 URLs
Notifying search engines.
[ERROR] When attempting to access our generated Sitemap at the following URL:
    http://foo.bar.loc/sitemap.xml
  we failed to read it.  Please verify the store_into path you specified in
  your configuration file is web-accessable.  Consult the FAQ for more
  information.
[WARNING] Proceeding to notify with an unverifyable URL.
Notifying: www.google.com
Notification URL: http://www.google.com/webmasters/sitemaps/ping?sitemap=http%3A%2F%2Ffoo.bar.loc%2Fsitemap.xml
Number of errors: 1
Number of warnings: 3

此错误在文档的“故障排除”部分中进行了描述。但我已经检查了base_urland store_into-- 两者都设置正确。

为什么现在出现这个错误?我做错了什么吗?什么?如何让工具工作?

谢谢

4

1 回答 1

0

您需要一个包含实际网址的 urllist.txt。站点生成器不会为您抓取/抓取您的站点。它可以检查 apache 日志或引用其他生成的站点地图,但它本身不会爬网。

请参阅我的答案:

https://webmasters.stackexchange.com/questions/47085/is-there-an-xml-sitemap-generator-with-command-line-interface-for-nginx-on-linux/47105#47105

我有一个命令字符串,可以通过抓取给定站点生成 url 列表。

于 2013-04-10T13:03:05.633 回答