如果我正在复制一个文件然后比较它:
import shutil, filecmp
# dummy file names, they're not important
InFile = "d:\\Some\\Path\\File.ext"
CopyFile = "d:\\Some\\other\\Path\\File_Copy.ext"
# copy the file
shutil.copyfile(InFile,CopyFile)
# compare the two files
if not filecmp.cmp(InFile,CopyFile,shallow=False):
print "File not copied correctly"
为什么?这似乎有点毫无意义不是吗?毕竟我刚刚复制了文件,它必须是相同的,不是吗?错误的!硬盘驱动器具有可接受的错误率,非常小但仍然存在。唯一可以确定的方法是重新读取文件,但由于它只是在内存中,我怎么能确定系统(Windows 7)实际上已经从媒体中读取了文件,而不仅仅是从备用内存中返回了页面?
假设我必须将 16 TB 的数据写入可移动硬盘驱动器,并且我必须确保磁盘上的所有文件都没有损坏 - 或者至少不会比活动文件更损坏。在那 16 TB 的磁盘空间中,可能有一些文件不完全相同;我目前正在使用WinDiff 逐字节检查文件,但该文件比较实用程序很慢,但至少我可以合理地确定它实际上正在读取从光盘复制的文件,因为该页面应该早已不复存在。
任何人都可以提供基于确定性的专家意见,可能会发生:阅读还是记住?
值得怀疑的是,如果我复制的内存少于已安装的内存,则验证过程比复制快 - 应该是,读取比写入快,但不是那么快。如果我复制 3GB 的文件(我安装了 32GB 的内存)并且需要一分钟,那么验证应该需要 50 秒左右,并且在资源监视器上应该是 100% 的磁盘使用.. 不是,验证需要不到 10 秒并且资源监视器不让步。如果我复制的内存超过了安装的内存,那么验证几乎需要同样长的时间,并且资源监视器显示 100% - 这是我所期望的!那么这里发生了什么?
作为参考,删除了错误检查的真实代码:
import shutil, filecmp, os, sys
FromFolder = sys.argv[1]
ToFolder = sys.argv[2]
VerifyList = list()
VerifyToList = list()
BytesCopied = 0
if not os.path.exists(ToFolder):
os.mkdir(ToFolder)
for (path, dirs, files) in os.walk(FromFolder):
RelPath = path[len(FromFolder):len(path)]
OutPath = ToFolder + RelPath
if not os.path.exists(OutPath):
os.mkdir(OutPath)
for thisFile in files:
InFile = path + "\\" + thisFile
CopyFile = OutPath + "\\" + thisFile
ByteSize = os.path.getsize(InFile)
if ByteSize < 1024:
RepSize = "%d bytes" % ByteSize
elif ByteSize < 1048576:
RepSize = "%.1f KB" % (ByteSize / 1024)
elif ByteSize < 1073741824:
RepSize = "%.1f MB" % (ByteSize / 1048576)
else:
RepSize = "%.1f GB" % (ByteSize / 1073741824)
print "copy %s > %s " % (RepSize, thisFile)
VerifyList.append(InFile)
VerifyToList.append(CopyFile)
shutil.copyfile(InFile,CopyFile)
# finished copying, now verify
FileIndex = range(len(VerifyList))
reVerifyList = list()
reVerifyToList = list()
for thisIndex in FileIndex:
InFile = VerifyList[thisIndex]
CopyFile = VerifyToList[thisIndex]
thisFile = os.path.basename(InFile)
ByteSize = os.path.getsize(InFile)
if ByteSize < 1024:
RepSize = "%d bytes" % ByteSize
elif ByteSize < 1048576:
RepSize = "%.1f KB" % (ByteSize / 1024)
elif ByteSize < 1073741824:
RepSize = "%.1f MB" % (ByteSize / 1048576)
else:
RepSize = "%.1f GB" % (ByteSize / 1073741824)
print "Verify %s > %s" % (RepSize, thisFile)
if not filecmp.cmp(InFile,CopyFile,shallow=False):
#thisFile = os.path.basename(InFile)
print "File not copied correctly " + thisFile
# copy, second chance
reVerifyList.append(InFile)
reVerifyToList.append(CopyFile)
shutil.copyfile(InFile,CopyFile)
del VerifyList
del VerifyToList
if len(reVerifyList) > 0:
FileIndex = range(len(reVerifyList))
for thisIndex in FileIndex:
InFile = reVerifyList[thisIndex]
CopyFile = reVerifyToList[thisIndex]
if not filecmp.cmp(InFile,CopyFile,shallow=False):
thisFile = os.path.basename(InFile)
print "File failed 2nd chance " + thisFile