1

I was thinking of ways to have my laptop HDD backed up safely, and still being able to put the backup rapidly in use if needed. My method would be the following: I would buy an 2.5" HDD of the same size with USB to SATA cable and clone the internal to it, when disaster strikes, I would just have to swap the HDD in the laptop for the other one and I would be good to go again. However, I would like to avoid writing 500GB each time I want to backup my HDD, especially when I know that a fair part of it (+/- 80GB) is rarely written to, this is where the following md5sum/dd script comes to the rescue, I hope:

#!/bin/bash
block="1M"
end=50000
count=10

input="/dev/sda"

output="/dev/sdb"
output="/path/to/imagefile"


function md5compute()
{
    dd if=$1 skip=$2 bs=$block count=$count | md5sum - | awk '{ print $1 }'
}
for i in {0..$end}
do
    start=$(($i*$count))
    md5source=$(md5compute $input $start)
    md5destination=$(md5compute $output $start)
    if [ "$md5source" != "$md5destination" ]
    then
        dd if=$input of=$output skip=$start seek=$start count=$count conv=sync,noerror,notrunc
    fi
done

Now, the question part:

A) With running this, would I miss some part of the disk? Do you see some flaws?

B) Could I win some time compared to the 500GB read/write?

C) Obviously I potentially write less to the target disk. Will I improve the lifetime of that disk?

D) I was thinking of leaving count to 1, and increasing the block size.Is this good idea/bad idea?

E) Would this same script work with an image file as output?

Not being very fluent in programming, there should be plenty of room for improvement, any tips?

Thank you all...

4

1 回答 1

0

逐点回答:

  1. 运行这个,我会错过磁盘的某些部分吗?

    • 不。
  2. 你看到一些缺陷了吗?

    • 虽然单位不同。这意味着在源处进行双重读取,在目的地处写入之前进行完全读取。这将主要改善备份时间。
    • 存在差异时,MD5 匹配的可能性很小。通过使用SHA1MD256或其他更难的校验和算法来降低此概率。但这意味着两端都有更多资源。(参见维基百科上的生日问题
  3. 与 500GB 读/写相比,我能赢得一些时间吗?

    • 如果两个单位已经相同,是的,因为阅读通常比写作快。(取决于处理器,用于校验和计算:这在非常差的处理器上可能很重要)
  4. 显然,我可能会减少对目标磁盘的写入。我会提高该磁盘的寿命吗?

    • 在这种情况下,是的,但如果你只写差异,这会快很多,并真正提高你的磁盘寿命。
    • 当磁盘不同时,您重新写入整个磁盘,这效率不高!
  5. 我正在考虑将计数保留为 1,并增加块大小。这是好主意/坏主意吗?

    • 我认为这在全球范围内是个坏主意。为什么要重新发明轮子?
  6. 这个相同的脚本是否可以将图像文件用作输出?

    • 是的。

功能回答。

对于这样的工作,您可以使用rsync!使用此工具,您可以

  • 在传输过程中压缩数据
  • 通过网络复制
  • 使用 SSH 隧道(或不使用)
  • 仅传输(写入)修改的块

使用dd和_md5sum

我有时会运行一种命令:

ssh $USER@$SOURCE "dd if=$PATH/$SRCDEV |tee >(sha1sum >/dev/stderr);sleep 1" |
    tee >(sha1sum >/dev/tty) | dd of=$LOCALPATH/$LOCALDEV

这将在源主机上进行完整读取,而不是在发送到本地主机(目标)之前的 sha1sum,而不是sha1sum 以确保在写入本地设备之前进行传输。

这可能会呈现如下内容:

2998920+0 records in
2998920+0 records out
1535447040 bytes (1.4gB) copied, 81.42039 s, 18.3 MB/s
d61c645ab2c561eb10eb31f12fbd6a7e6f42bf11  -
d61c645ab2c561eb10eb31f12fbd6a7e6f42bf11  -
2998920+0 records in
2998920+0 records out
1535447040 bytes (1.4gB) copied, 81.42039 s, 18.3 MB/s
于 2013-12-31T09:03:43.600 回答