filesystems - 在 POSIX 系统上修改多个文件的安全有效的方法？

Question

我一直在关注关于 EXT4 上的“错误”的讨论，如果使用“创建临时文件，写入临时文件，将临时文件重命名为目标文件”过程，会导致文件在崩溃中归零。POSIX 表示，除非调用 fsync()，否则您无法确定数据是否已刷新到硬盘。

显然在做：

0) get the file contents (read it or make it somehow)
1) open original file and truncate it
2) write new contents
3) close file

即使使用 fsync() 也不好，因为计算机可能会在 2) 或 fsync() 期间崩溃，并且您最终会得到部分写入的文件。

通常人们认为这是非常安全的：

0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) close temp file
4) rename temp file to original file

不幸的是，事实并非如此。为了使其在 EXT4 上安全，您需要执行以下操作：

0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) fsync()
4) close temp file
5) rename temp file to original file

这将是安全的，并且在崩溃时，您应该拥有新文件内容或旧的、从未归零的内容或部分内容。但是如果应用程序使用大量文件，每次写入后的 fsync() 会很慢。

所以我的问题是，如何在需要 fsync() 以确保更改已保存到磁盘的系统上有效地修改多个文件？我的意思是修改许多文件，如数千个文件。修改两个文件并在每个文件之后执行 fsync() 不会太糟糕，但是 fsync() 在修改多个文件时确实会减慢速度。

编辑：将 fsync() 关闭临时文件更改为正确顺序，增加了对编写许多许多文件的重视。

score 3 · Accepted Answer

简短的回答是：在应用层解决这个问题是错误的。EXT4一定要保证我关闭文件后及时写入数据。就像现在一样，EXT4“优化”了这种写入，以便能够收集更多的写入请求并一次性将它们爆发出来。

问题很明显：无论您做什么，都无法确定您的数据是否在磁盘上结束。手动调用 fdisk() 只会让事情变得更糟：您基本上会妨碍 EXT4 的优化，从而减慢整个系统的速度。

OTOH，EXT4 具有在需要将数据写入磁盘时做出有根据的猜测所需的所有信息。在这种情况下，我将临时文件重命名为现有文件的名称。对于 EXT4，这意味着它必须要么推迟重命名（因此原始文件的数据在崩溃后保持不变）要么必须立即刷新。由于它不能推迟重命名（下一个进程可能想要查看新数据），因此重命名隐式意味着刷新，并且刷新必须发生在 FS 层，而不是应用层。

EXT4 可能会创建文件系统的虚拟副本，其中包含磁盘未修改（尚未）的更改。但这并不影响最终目标：应用程序无法知道 FS 如果要进行哪些优化，因此 FS 必须确保它完成其工作。

在这种情况下，无情的优化已经走得太远并破坏了结果。黄金法则：优化绝不能改变最终结果。如果你不能保持这一点，你就不能优化。

只要 Tso 认为拥有一个快速的 FS 比一个行为正确的 FS 更重要，我建议不要升级到 EXT4 并关闭所有关于“按照 Tso 设计的工作”的错误报告。

[编辑] 关于这个的更多想法。您可以使用数据库而不是文件。让我们暂时忽略资源浪费。谁能保证数据库使用的文件不会因崩溃而损坏？大概。数据库可以每分钟左右写入数据并调用 fsync()。但是，您也可以这样做：

while True; do sync ; sleep 60 ; done

同样，FS 中的错误会阻止它在任何情况下都有效。否则，人们就不会被这个错误所困扰。

您可以使用诸如 Windows 注册表之类的后台配置守护进程。守护进程会将所有配置写入一个大文件中。它可以在写完所有内容后调用 fsync() 。问题解决了......为您的配置。现在您需要对您的应用程序编写的所有其他内容执行相同的操作：文本文档、图像等等。我的意思是几乎所有 Unix 进程都会创建一个文件。这是整个 Unix 理念的基础！

显然，这不是一条可行的道路。所以答案仍然存在：您这边没有解决方案。继续打扰 Tso 和其他 FS 开发人员，直到他们修复他们的错误。

score 1 · Accepted Answer

My own answer would be to keep to the modifications on temp files, and after finishing writing them all, do one fsync() and then do rename on them all.

score 0 · Accepted Answer

You need to swap 3 & 4 in your last listing - fsync(fd) uses the file descriptor. and I don't see why that would be particularly costly - you want the data written to disk by the close() anyway. So the cost will be the same between what you want to happen and what will happen with fsync().

If the cost is too much, (and you have it) fdatasync(2) avoid syncing the meta-data, so should be lighter cost.

EDIT: So I wrote some extremely hacky test code:

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <string.h>

static void testBasic()
{
    int fd;
    const char* text = "This is some text";

    fd = open("temp.tmp", O_WRONLY | O_CREAT);
    write(fd,text,strlen(text));
    close(fd);
    rename("temp.tmp","temp");
}

static void testFsync()
{
    int fd;
    const char* text = "This is some text";

    fd = open("temp1", O_WRONLY | O_CREAT);
    write(fd,text,strlen(text));
    fsync(fd);
    close(fd);
    rename("temp.tmp","temp");
}

static void testFdatasync()
{
    int fd;
    const char* text = "This is some text";

    fd = open("temp1", O_WRONLY | O_CREAT);
    write(fd,text,strlen(text));
    fdatasync(fd);
    close(fd);
    rename("temp.tmp","temp");
}

#define ITERATIONS 10000

static void testLoop(int type)
{
    struct timeval before;
    struct timeval after;
    long seconds;
    long usec;
    int i;

    gettimeofday(&before,NULL);
    if (type == 1)
    {
        for (i = 0; i < ITERATIONS; i++)
        {
            testBasic();
        }
    }
    if (type == 2)
    {
        for (i = 0; i < ITERATIONS; i++)
        {
            testFsync();
        }
    }
    if (type == 3)
    {
        for (i = 0; i < ITERATIONS; i++)
        {
            testFdatasync();
        }
    }
    gettimeofday(&after,NULL);

    seconds = (long)(after.tv_sec - before.tv_sec);
    usec = (long)(after.tv_usec - before.tv_usec);
    if (usec < 0)
    {
        seconds--;
        usec += 1000000;
    }

    printf("%ld.%06ld\n",seconds,usec);
}

int main()
{
    testLoop(1);
    testLoop(2);
    testLoop(3);
    return 0;
}

On my laptop that produces:

0.595782
6.338329
6.116894

Which suggests doing the fsync() is ~10 times more expensive. and fdatasync() is slightly cheaper.

I guess the problem I see is that every application is going to think it's data is important enough to fsync(), so the performance advantages of merging writes over a minute will be eliminated.

score 0 · Accepted Answer

您提到的问题已经过充分研究，您一定要阅读以下内容： https ://www.academia.edu/9846821/Towards_Efficient_Portable_Application-Level_Consistency

在安全重命名行为下可以跳过 Fsync，在安全新文件行为下可以跳过目录 fsync 。两者都是特定于实现的，POSIX 不保证。

filesystems - 在 POSIX 系统上修改多个文件的安全有效的方法？

4 回答 4

Related

Reference