1

我们一直在尝试使用内置的 Postgresql 声明式分区在谷歌云上对 postgresql 数据库进行分区,postgres_fdwhere所述。

我们正在运行以下命令:

分片 1:

CREATE TABLE message_1 (
    id SERIAL,                                                                                        
    m_type character varying(20),
    content character varying(256) NOT NULL,
    is_received boolean NOT NULL,                                                              
    is_seen boolean NOT NULL,
    is_active boolean NOT NULL,
    created_at timestamp with time zone NOT NULL,
    room_no_id integer NOT NULL,
    sender_id integer NOT NULL
);

CREATE TABLE message_2 (
    id SERIAL,                                                                                        
    m_type character varying(20),
    content character varying(256) NOT NULL,
    is_received boolean NOT NULL,                                                              
    is_seen boolean NOT NULL,
    is_active boolean NOT NULL,
    created_at timestamp with time zone NOT NULL,
    room_no_id integer NOT NULL,
    sender_id integer NOT NULL
);

分片 2:

CREATE TABLE message_3 (
    id SERIAL,                                                                                        
    m_type character varying(20),
    content character varying(256) NOT NULL,
    is_received boolean NOT NULL,                                                              
    is_seen boolean NOT NULL,
    is_active boolean NOT NULL,
    created_at timestamp with time zone NOT NULL,
    room_no_id integer NOT NULL,
    sender_id integer NOT NULL
);

CREATE TABLE message_4 (
    id SERIAL,                                                                                        
    m_type character varying(20),
    content character varying(256) NOT NULL,
    is_received boolean NOT NULL,                                                              
    is_seen boolean NOT NULL,
    is_active boolean NOT NULL,
    created_at timestamp with time zone NOT NULL,
    room_no_id integer NOT NULL,
    sender_id integer NOT NULL
);  

源机:

CREATE SERVER shard_1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'shard_1_ip', dbname 'shard_1_db', port '5432');
CREATE SERVER shard_2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'shard_2_ip', dbname 'shard_2_db', port '5432');

CREATE USER MAPPING for source_user SERVER shard_1 OPTIONS (user 'shard_1_user', password 'shard_1_user_password');
CREATE USER MAPPING for source_user SERVER shard_2 OPTIONS (user 'shard_2_user', password 'shard_2_user_password');

CREATE TABLE room (
    id SERIAL PRIMARY KEY,
    name character varying(20) NOT NULL,
    created_at timestamp with time zone NOT NULL,
    updated_at timestamp with time zone NOT NULL,
    is_active boolean NOT NULL
);

insert into room (
    name, created_at, updated_at, is_active
)
select
    concat('Room_', floor(random() * 400000 + 1)::int, '_', floor(random() * 400000 + 1)::int),
    i,
    i,
    TRUE
from generate_series('2019-01-01 00:00:00'::timestamp, '2019-4-30 01:00:00', '5 seconds') as s(i);

CREATE TABLE message (
    id SERIAL,                                                                                        
    m_type character varying(20),
    content character varying(256) NOT NULL,
    is_received boolean NOT NULL,                                                              
    is_seen boolean NOT NULL,
    is_active boolean NOT NULL,
    created_at timestamp with time zone NOT NULL,
    room_no_id integer NOT NULL,
    sender_id integer NOT NULL
) PARTITION BY HASH (room_no_id);

CREATE FOREIGN TABLE message_1
    PARTITION OF message
    FOR VALUES WITH (MODULUS 4, REMAINDER 1)
    SERVER shard_1;

CREATE FOREIGN TABLE message_2
    PARTITION OF message
    FOR VALUES WITH (MODULUS 4, REMAINDER 2)
    SERVER shard_1;

CREATE FOREIGN TABLE message_3
    PARTITION OF message
    FOR VALUES WITH (MODULUS 4, REMAINDER 3)
    SERVER shard_2;

CREATE FOREIGN TABLE message_4
    PARTITION OF message
    FOR VALUES WITH (MODULUS 4, REMAINDER 0)
    SERVER shard_2;

我们面临的问题是,当我们尝试使用以下查询插入数据时:

insert into message (
    m_type, content, is_received, is_seen, is_active, created_at, room_no_id, sender_id
)                                
select                                      
    'TEXT',                                                                                    
    CASE WHEN s.i % 2 = 0 THEN 'text 1'
        ELSE 'text 2'
    end,                                        
    TRUE,                      
    TRUE,                      
    TRUE,                        
    dr.created_at + s.i * (interval '1 hour'),
    dr.id,
    CASE WHEN s.i % 2 = 0 THEN split_part(dr.name, '_', 2)::int                                  
        ELSE split_part(dr.name, '_', 3)::int
    end,
from room as dr, generate_series(0, 10) as s(i);

插入大约 2000 万个条目需要将近1 小时 50 分钟。当我们没有对表进行分片时,执行相同的操作大约需要8 分钟。所以,这基本上比没有分片慢 14 倍。我们是否在这里遗漏了任何东西,或者插入使用这种方法进行分片的速度很慢?

如本视频所述,Citus 在插入中的表现似乎更好,所以对我来说,分片实际上会使性能降低这么多似乎有点奇怪。因此,它的性能可能不如 citus 好,但为什么性能如此之低。

提前致谢!!!

4

0 回答 0