postgresql - 我如何知道我的 PostgreSQL 服务器是否使用“C”语言环境？

Question

我正在尝试尽我所能优化我的 PostgreSQL 8.3 DB 表，但我不确定是否需要将varchar_pattern_ops某些列用于LIKE对字符串的前 N 个字符执行 a 的某些列。根据这个文档，xxx_pattern_ops只有“...当服务器不使用标准的'C'语言环境”时才需要使用。

有人可以解释这是什么意思吗？如何检查我的数据库使用的语言环境？

score 24 · Accepted Answer

目前，某些语言环境 [文档] 支持只能在 initdb 时间设置，但我认为_pattern_ops可以在运行时通过SET LC_COLLATE 修改相关的支持。要查看设置值，您可以使用SHOW命令。

例如：

SHOW LC_COLLATE

_pattern_ops索引在使用模式匹配结构的列中很有用，比如LIKE或正则表达式。您仍然必须创建一个常规索引（没有_pattern_ops）才能对索引进行相等搜索。所以你必须考虑到所有这些，看看你的表是否需要这样的索引。

关于什么是语言环境，它是一组关于字符顺序、格式和类似事物的规则，这些规则因语言/国家/地区而异。例如，区域设置 fr_CA（加拿大的法语）可能与 en_CA（加拿大的英语）有一些不同的排序规则（或显示数字的方式等）。标准“C”语言环境是符合 POSIX 标准的默认语言环境。只有严格的 ASCII 字符才有效，排序和格式的规则大多是 en_US（美国英语）

在计算中，区域设置是一组参数，用于定义用户的语言、国家和用户希望在其用户界面中看到的任何特殊变体偏好。通常一个地区标识符至少由一个语言标识符和一个地区标识符组成。

score 17 · Accepted Answer

psql -l

根据手册

示例输出：

                               List of databases
    Name     | Owner  | Encoding |   Collate   |    Ctype    | Access privileges
-------------+--------+----------+-------------+-------------+-------------------
 packrd      | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres    | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0   | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/packrd        +
             |        |          |             |             | packrd=CTc/packrd
 template1   | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/packrd        +
             |        |          |             |             | packrd=CTc/packrd
(5 rows)

score 6 · Accepted Answer

好的，根据我的阅读，这个初始设置似乎

initdb --locale=xxx

 --locale=locale
       Specifies the locale to be used in this database. This is equivalent to specifying both --lc-collate and --lc-ctype.

基本上为您之后创建的所有数据库指定“默认”语言环境（即它指定模板1 的设置，它是默认模板）。您可以创建具有不同语言环境的新数据库，如下所示：

语言环境不同于编码，您可以手动指定它和/或编码：

 CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;

如果你想手动调用它。

基本上，如果您不指定它，它会使用系统默认值，几乎从不是“C”。

因此，如果您show LC_COLLATE返回“C”或“POSIX”以外的任何内容，那么您没有使用standard C locale，您需要为索引指定 xxx_pattern_ops。另请注意，如果要使用 <、<=、> 或 >= 运算符，则需要创建不带 xxx_pattern_ops 标志的第二个索引（除非您在数据库上使用标准 C 语言环境，这很少见。 ..)。对于 == 和LIKE（等），您不需要第二个索引。如果您不需要，LIKE那么您可能也不需要带有 xxx_pattern_ops 的索引。

即使您的索引被定义为与“默认”进行整理，例如

CREATE INDEX my_index_name
  ON table_name
  USING btree
  (identifier COLLATE pg_catalog."default");

这还不够，除非默认是“C”（或 POSIX，同样的东西）排序规则，否则它不能用于LIKE 'ABC%'. 你需要这样的东西：

CREATE INDEX my_index_name
  ON table_name
  USING btree
  (identifier COLLATE pg_catalog."default" varchar_pattern_ops);

score 2 · Accepted Answer

If you've got the option...

You could recreate the database cluster with the C locale.

You need to pass the locale to initdb when initializing your Postgres instance.

You can do this regardless of what the server's default or user's locale is.

That's a server administration command though, not a database schema designers task. The cluster contains all the databases on the server, not just the one you're optimising.

It creates a brand new cluster, and does not migrate any of your existing databases or data. That'd be additional work.

Furthermore, if you're in a position where you can consider creating a new cluster as an option, you really should be considering using PostgreSQL 8.4 instead, which can have per-database locales, specified in the CREATE DATABASE statement.

score 2 · Accepted Answer

还有另一种方法（假设您要检查它们，而不是修改它们）：

检查文件 /var/lib/postgres/data/postgresql.conf 应该找到以下行：

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'                     # locale for system error message strings
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting

postgresql - 我如何知道我的 PostgreSQL 服务器是否使用“C”语言环境？

5 回答 5

Related

Reference