12

I've started working on a project where there is a fairly large table (about 82,000,000 rows) that I think is very bloated. One of the fields is defined as:

consistency character varying NOT NULL DEFAULT 'Y'::character varying

It's used as a boolean, the values should always either be ('Y'|'N').

Note: there is no check constraint, etc.

I'm trying to come up with reasons to justify changing this field. Here is what I have:

  • It's being used as a boolean, so make it that. Explicit is better than implicit.
  • It will protect against coding errors because right now there anything that can be converted to text will go blindly in there.

Here are my question(s).

  • What about size/storage? The db is UTF-8. So, I think there really isn't much of a savings in that regard. It should be 1 byte for a boolean, but also 1 byte for a 'Y' in UTF-8 (at least that's what I get when I check the length in Python). Is there any other storage overhead here that would be saved?
  • Query performance? Will Postgres get any performance gains for a where cause of "=TRUE" vs. "='Y'"?
4

2 回答 2

23

PostgreSQL(不像 Oracle)有一个成熟的boolean类型。通常,“是/否标志”应该是boolean. 这是正确使用的类型!

大小/存储呢?

基本上,一列在磁盘上boolean占用1 个字节
, 而textcharacter varying在此处引用手册)...

短字符串(最多 126 个字节)的存储要求是 1 个字节加上实际字符串

这是简单字符的2 个字节。因此,您可以将该列的存储量减半。

实际存储比这更复杂。每个表、页和行都有一些固定的开销,有特殊的NULL存储空间,有些类型需要数据对齐。整体影响将非常有限——如果有的话。
更多关于如何测量实际空间需求的信息。

编码UTF8在这里没有任何区别。基本 ASCII 字符与其他编码(如LATIN-1.

在您的情况下,根据您的描述,您应该保留您似乎已经拥有的NOT NULL 约束- 独立于基本类型。

查询性能?

在任何情况下使用布尔值都会稍微好一些。除了稍微小一点之外,for 的逻辑boolean更简单,varchar或者text通常还带有COLLATION特定规则。但是不要对这么简单的事情抱太大希望。

代替

WHERE consistency = 'Y'

你可以写:

WHERE consistency = TRUE

但是,实际上,您可以简化为:

WHERE consistency

无需进一步评估。

更改类型

转换表格很简单:

ALTER TABLE tbl ALTER consistency TYPE boolean
USING CASE consistency WHEN 'Y' THEN TRUE ELSE FALSE END;

这个CASE表达式将所有不是TRUE('Y') 的东西折叠成FALSE。NOT NULL 约束仍然存在。

于 2012-10-11T00:18:54.997 回答
2

从单个 VARCHAR 切换到 BOOLEAN,存储大小和查询性能都不会显着提高。尽管您是对的,当您谈论二进制值时使用布尔值在技术上更清洁,但更改的成本可能远高于收益。如果您担心正确性,那么您可以检查列,例如

ALTER TABLE tablename ADD CONSTRAINT consistency CHECK (consistency IN ('Y', 'N'));
于 2012-10-12T21:26:01.570 回答