我们一直在尝试使用内置的 Postgresql 声明式分区在谷歌云上对 postgresql 数据库进行分区,postgres_fdw
如here所述。
我们正在运行以下命令:
分片 1:
CREATE TABLE message_1 (
id SERIAL,
m_type character varying(20),
content character varying(256) NOT NULL,
is_received boolean NOT NULL,
is_seen boolean NOT NULL,
is_active boolean NOT NULL,
created_at timestamp with time zone NOT NULL,
room_no_id integer NOT NULL,
sender_id integer NOT NULL
);
CREATE TABLE message_2 (
id SERIAL,
m_type character varying(20),
content character varying(256) NOT NULL,
is_received boolean NOT NULL,
is_seen boolean NOT NULL,
is_active boolean NOT NULL,
created_at timestamp with time zone NOT NULL,
room_no_id integer NOT NULL,
sender_id integer NOT NULL
);
分片 2:
CREATE TABLE message_3 (
id SERIAL,
m_type character varying(20),
content character varying(256) NOT NULL,
is_received boolean NOT NULL,
is_seen boolean NOT NULL,
is_active boolean NOT NULL,
created_at timestamp with time zone NOT NULL,
room_no_id integer NOT NULL,
sender_id integer NOT NULL
);
CREATE TABLE message_4 (
id SERIAL,
m_type character varying(20),
content character varying(256) NOT NULL,
is_received boolean NOT NULL,
is_seen boolean NOT NULL,
is_active boolean NOT NULL,
created_at timestamp with time zone NOT NULL,
room_no_id integer NOT NULL,
sender_id integer NOT NULL
);
源机:
CREATE SERVER shard_1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'shard_1_ip', dbname 'shard_1_db', port '5432');
CREATE SERVER shard_2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'shard_2_ip', dbname 'shard_2_db', port '5432');
CREATE USER MAPPING for source_user SERVER shard_1 OPTIONS (user 'shard_1_user', password 'shard_1_user_password');
CREATE USER MAPPING for source_user SERVER shard_2 OPTIONS (user 'shard_2_user', password 'shard_2_user_password');
CREATE TABLE room (
id SERIAL PRIMARY KEY,
name character varying(20) NOT NULL,
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL,
is_active boolean NOT NULL
);
insert into room (
name, created_at, updated_at, is_active
)
select
concat('Room_', floor(random() * 400000 + 1)::int, '_', floor(random() * 400000 + 1)::int),
i,
i,
TRUE
from generate_series('2019-01-01 00:00:00'::timestamp, '2019-4-30 01:00:00', '5 seconds') as s(i);
CREATE TABLE message (
id SERIAL,
m_type character varying(20),
content character varying(256) NOT NULL,
is_received boolean NOT NULL,
is_seen boolean NOT NULL,
is_active boolean NOT NULL,
created_at timestamp with time zone NOT NULL,
room_no_id integer NOT NULL,
sender_id integer NOT NULL
) PARTITION BY HASH (room_no_id);
CREATE FOREIGN TABLE message_1
PARTITION OF message
FOR VALUES WITH (MODULUS 4, REMAINDER 1)
SERVER shard_1;
CREATE FOREIGN TABLE message_2
PARTITION OF message
FOR VALUES WITH (MODULUS 4, REMAINDER 2)
SERVER shard_1;
CREATE FOREIGN TABLE message_3
PARTITION OF message
FOR VALUES WITH (MODULUS 4, REMAINDER 3)
SERVER shard_2;
CREATE FOREIGN TABLE message_4
PARTITION OF message
FOR VALUES WITH (MODULUS 4, REMAINDER 0)
SERVER shard_2;
我们面临的问题是,当我们尝试使用以下查询插入数据时:
insert into message (
m_type, content, is_received, is_seen, is_active, created_at, room_no_id, sender_id
)
select
'TEXT',
CASE WHEN s.i % 2 = 0 THEN 'text 1'
ELSE 'text 2'
end,
TRUE,
TRUE,
TRUE,
dr.created_at + s.i * (interval '1 hour'),
dr.id,
CASE WHEN s.i % 2 = 0 THEN split_part(dr.name, '_', 2)::int
ELSE split_part(dr.name, '_', 3)::int
end,
from room as dr, generate_series(0, 10) as s(i);
插入大约 2000 万个条目需要将近1 小时 50 分钟。当我们没有对表进行分片时,执行相同的操作大约需要8 分钟。所以,这基本上比没有分片慢 14 倍。我们是否在这里遗漏了任何东西,或者插入使用这种方法进行分片的速度很慢?
如本视频所述,Citus 在插入中的表现似乎更好,所以对我来说,分片实际上会使性能降低这么多似乎有点奇怪。因此,它的性能可能不如 citus 好,但为什么性能如此之低。
提前致谢!!!