0

In my testkernel program I'd like to walk directory trees via a variety of protocols. What I think I want is something like os.walk, but which works for ftp, and for typical http directory listings also (like http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/). This is in the spirit of openanything.py

For FTP walking I found several options, including ftptool, and the ftputil module which has the advantage of being in Ubuntu. I've already implemented my own very simple recursive walk of http directory listings, using Beautiful Soup. But before I combine them together with os.walk, I wonder if it has been done already.

I know the semantics of http walking are not well-defined like they are for file systems and ftp, so I guess I'll have to guess that directories are indicated by a URL with a trailing slash which extends the URL of the directory. And I'll have to be careful to avoid infinite walks. But even for a subset of os.walk (e.g. only topdown), this sort of thing seems useful.

Has this been done? Any advice?

4

1 回答 1

-1

好吧,我写了一段代码,它实际上遍历了 web 目录并下载了文件:(虽然这段代码可能需要改进,如图像下载、pdf 下载等),但无论如何这里是源/模块:

从远程源递归下载文件和目录

于 2012-04-21T08:17:51.523 回答