macos-big-sur - 带有 Big Sur 的基于 ARM 的 M1 Mac 上的 Postgres 错误

Question

自从我有了新的基于 ARM 的 M1 MacBook Pro 以来，我一直遇到严重且一致的 PostgreSQL 问题 (psql 13.1)。无论我使用 Rails 服务器还是 Foreman，我都会在浏览器和终端（如PG::InternalError: ERROR: could not read block 15 in file "base/147456/148555": Bad addressorPG::Error (invalid encoding name: unicode)或Error during failsafe response: PG::UnableToSend: no connection to the server. 奇怪的是，我经常可以反复刷新浏览器以使事情正常运行（直到它们不可避免地不再起作用）。

我知道与基于 ARM 的 M1 Mac 相关的所有配置挑战，这就是为什么我以多种方式多次卸载并重新安装从 Homebrew 到 Postgres 的所有内容（使用 Rosetta，不使用 Rosetta，使用arch -x86_64 brew命令，使用 Postgres 应用程序）而不是 Homebrew 安装）。我在随机留言板上遇到了其他几个人，他们遇到了同样的问题（也在新的 Mac 上）并且没有任何运气，这就是为什么我不愿意相信这是一个驱动器损坏问题。（我还多次运行磁盘实用程序急救检查；它说一切正常，但我不知道这有多可靠。）

我正在使用thoughtbot parity 将我的开发环境数据库与当前生产的数据库同步。当我运行时development restore production，我的终端中有数百行类似于下面的输出（这是在下载完成之后但在继续创建默认值、处理数据、序列集等之前）。我相信这是问题的根源，但我不确定解决方案是什么：

pg_restore: dropping TABLE [table name1]
pg_restore: from TOC entry 442; 1259 15829269 TABLE [table name1] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR:  table "[table name1]" does not exist
Command was: DROP TABLE "public"."[table name1]";
pg_restore: dropping TABLE [table name2]
pg_restore: from TOC entry 277; 1259 16955 TABLE [table name2] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR:  table "[table name2]" does not exist
Command was: DROP TABLE "public"."[table name2]";
pg_restore: dropping TABLE [table name3]
pg_restore: from TOC entry 463; 1259 15830702 TABLE [table name3] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR:  table "[table name3]" does not exist
Command was: DROP TABLE "public"."[table name3]";
pg_restore: dropping TABLE [table name4]
pg_restore: from TOC entry 445; 1259 15830421 TABLE [table name4] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR:  table "[table name4]" does not exist
Command was: DROP TABLE "public"."[table name4]";

有没有其他人经历过这个？任何解决方案的想法将不胜感激。谢谢！

编辑：我能够在较旧的 MacBook Pro（也运行 Big Sur）上重现相同的问题，因此它似乎与 M1 无关，但可能与 Big Sur 有关。

score 11 · Accepted Answer

明确的解决方法：

在尝试了其他答案中的所有解决方法后，我仍然偶尔会收到此错误。即使在转储和恢复数据库、切换到 M1-native postgres、运行各种维护脚本等之后。

在对 postgresql.conf 进行了大量修改之后，唯一可以无限期可靠地解决此问题的方法（此后未收到错误）：

在 postgresql.conf 中，更改：

max_worker_processes = 8

至

max_worker_processes = 1

进行此更改后，我已将每个测试都扔到了我以前错误缠身的数据库中，并且一次都没有显示相同的错误。以前，我在大约 20M 记录的数据库上运行的提取例程在处理 1-2 百万条记录后会给出错误的地址错误。现在它完成了整个过程。

显然，减少并行工作者的数量会降低性能，但这是我发现的唯一可靠且永久地解决此问题的方法。

score 2 · Accepted Answer

Is it possible that something in the Big Sur Beta 11.3 fixed this issue?

I've been having the same issues as OP since installing PostgreSQL 13 using MacPorts on my Mac mini M1 (now on PostgreSQL 13.2).

I would see could not read block errors:

Occasionally when running ad hoc queries
Always when compiling a book in R Markdown that makes several queries
Always when running VACUUM FULL on my main database (there's about 620 GB in the instance on this machine and the error would be thrown very quickly relative to how long a VACUUM FULL would take).

(My "fix" so far has been to point my Mac to the Ubuntu server I have running in the corner of my office, so no real problem for me.)

But I've managed to do 2 and 3 without the error since upgrading to Big Sur Beta 11.3 today (both failed immediately prior to upgrading). Is it possible that something in the OS fixed this issue?

score 2 · Accepted Answer

更新#2：

WAL Buffer 等调整延长了错误之间的时间，但并没有完全消除它。最后使用 Homebrew 重新安装新的 Apple Silicon 版本的 Postgres，然后对我现有的数据库进行 pg_dump（遇到错误）并将其恢复到新的安装/集群。

这是有趣的一点：pg_restore 未能恢复数据库中的索引之一，并在恢复过程中记录了它（否则会完成）。我的预感是该索引的损坏或其他问题导致了Bad Address错误。因此，我对这个问题的最终建议是执行 pg_dump，然后使用pg_restore，而不是 pg_dump来恢复数据库。pg_restore 似乎已经标记了 pg_dump 没有标记的这个问题，写入一个干净的数据库没有错误的索引。

更新：

在尝试了几种变通方法（包括完整的 pg_dump 和恢复受影响的数据库）后继续遇到此问题。虽然一些修复似乎延长了事件之间的时间（特别是增加共享缓冲内存），但没有一个被证明是永久修复。

也就是说，对 postgres 邮件列表的更多挖掘表明，此“错误地址”错误可能与 WAL（预写日志）问题一起发生。因此，我现在在我的 postgresql.conf 文件中设置了以下内容，显着增加了 WAL 缓冲区大小：

wal_buffers = 4MB

并且从那以后就没有遇到过这个问题（再次敲木头）。

这会产生一些影响是有道理的，因为默认情况下 wal_buffer 大小与共享缓冲区大小成比例增加（如前所述，增加共享缓冲区大小提供了暂时的缓解）。无论如何，在我们得到有关导致此错误的确切消息之前，可以尝试其他方法。

在 M1 MacBook Air 上偶尔会遇到这个确切的问题：ERROR: could not read block并且Bad Address以各种排列方式出现。

我在 postgres 论坛上读到这个问题可能发生在虚拟机设置中。因此，我认为这是由 Rosetta 引起的。即使您使用的是通用版本的 postgres，您也可能仍将 x86 二进制文件用于某些辅助进程（例如，在我的例子中是 Python）。

无论如何，这是解决问题的方法（到目前为止）：重新索引数据库

注意：您需要从命令行重新索引，而不是使用 SQL 命令。当我尝试使用 SQL 重新索引时，我Bad Address一遍又一遍地遇到相同的错误，并且重新索引从未完成。

当我使用命令行重新索引时，该过程完成，并且Bad Address错误没有再次发生（敲木头）。

对我来说，这只是：

reindexdb name_of_database

12GB 数据库需要 20-30 分钟。我不仅不再收到这些错误，而且数据库似乎更容易启动。只希望问题不会随着 Rosetta 中的重复读/写/索引创建而返回。我不确定为什么会这样……也许在 M1 Mac 上创建的索引容易损坏？由于 Rosetta 交互，索引可能由于写入或访问而损坏？

score -1 · Accepted Answer

我postgresql.conf从postgresql.conf.sample（并重新启动数据库服务器）恢复，从那时起它工作正常。

TBC，我在这里都尝试过wal_buffers，max_worker_processes但没有帮助。我偶然发现了它，因为我尝试了很多我只需要回去的东西。我没有重新初始化整个数据库或类似的东西，只是配置文件。

macos-big-sur - 带有 Big Sur 的基于 ARM 的 M1 Mac 上的 Postgres 错误

4 回答 4

Related

Reference