python - Python FTP下载 - 忽略下载目录中已经存在的文件

Question

我有一个脚本可以按设定的时间间隔从 FTP 目录中提取文件。但是，由于文件只是被复制而不是移动，因此最终会一遍又一遍地提取相同的文件。确保我只提取新文件的最佳方法是什么？我正在考虑将 FTP 站点上的文件与本地目录中的文件交叉引用，但不太确定该怎么做。另外，我将如何不仅检查文件名，还要检查修改日期？例如：random_file.txt 最初于 2012 年 10 月 25 日下午 2:15 放置在 FTP 站点上，并在 5 分钟后下载。然后，在 2012 年 10 月 26 日上午 11:40，FTP 站点上的 random_file.txt 被替换为更新版本。我可以从 FTP 站点下载和/或仅覆盖本地目录中的新文件吗？谢谢！

这是我现有的代码：

import ftplib, os

def fetch():
    server = 'ftp.example.com'
    username = 'foo'
    password = 'bar'
    directory = '/random_directory/'
    filematch = '*.txt'
    ftp = ftplib.FTP(server)
    ftp.login(username, password)
    ftp.cwd(directory)
    for filename in ftp.nlst(filematch):
        fhandle = open(os.path.join('C:my_directory', filename), 'wb')
        print 'Getting ' + filename
        ftp.retrbinary('RETR ' + filename, fhandle.write)
        fhandle.close()

更新：所以我至少部分地使用了 Siddharth Toshniwal 的链接来解决这个问题。对于那些可能偶然发现并需要它的人，这是我到目前为止的新代码。请注意，这仅检查文件是否存在，而不是修改日期：

for filename in ftp.nlst(filematch):
        if os.path.exists('C:\my_directory\\' + filename) == False:
            fhandle = open(os.path.join('C:\my_directory', filename), 'wb')
            print 'Getting ' + filename
            ftp.retrbinary('RETR ' + filename, fhandle.write)
            fhandle.close()
        elif os.path.exists(('C:\my_directory\\' + filename)) == True:
            print 'File ', filename, ' Already Exists, Skipping Download'

score 2 · Accepted Answer

我赞同使用 rsync 之类的东西而不是在 python 中破解某些东西的观点。

但无论出于何种原因，如果这不可行，以下链接应该可以帮助您： http ://code.activestate.com/recipes/327141-simple-ftp-directory-synch/ http://alexharvey.eu/code/python /get-a-files-last-modified-datetime-using-python/

python - Python FTP下载 - 忽略下载目录中已经存在的文件

1 回答 1

Related

Reference