0

我必须创建一个数据库,它有大约 1.3 亿条记录,加载时大约 90GB。我想在天蓝色中主持这个。数据库修改将作为批处理操作每天进行一次。必须更频繁地进行搜索,这应该会在 2 秒内返回结果。搜索包括文本搜索(地址)和其他一些数字字段。

我试图找出最具成本效益的数据库,但提供上述性能。我已经测试了单服务器 PostgreSQL 数据库,但即使我添加了性能也很糟糕

General Purpose Gen-5 4 cores (20GB RAM) and 500GB storage (1500 IOPS). 

表架构:

CREATE TABLE properties(
    PropertyId bigint,
    Address text,
    Latitude double precision,
    Longitude double precision,
    Rooms int,
    BathRooms int
)

索引:

CREATE INDEX address_idx ON properties USING GIN (Address gin_trgm_ops);
CREATE INDEX propertyid_idx ON properties(PropertyId);
CREATE INDEX latitude_idx ON properties(Latitude);
CREATE INDEX longitude_idx ON properties(Longitude);

示例查询:

select * from my_table 
where Latitude between x and y
and Longitude between p and q
and address like '%address%';

分析:

"Bitmap Heap Scan on properties  (cost=34256.04..34901.54 rows=10 width=561) (actual time=24664.562..32007.752 rows=35 loops=1)"
"  Recheck Cond: ((Address ~~ '%3365%'::text) AND (Longitude >= '-90.5'::double precision) AND (Longitude <= '-90'::double precision))"
"  Rows Removed by Index Recheck: 1123"
"  Filter: ((propertylatitude >= '38'::double precision) AND (propertylatitude <= '39'::double precision))"
"  Rows Removed by Filter: 64"
"  Heap Blocks: exact=1213"
"  Buffers: shared hit=181 read=6478"
"  I/O Timings: read=31160.388"
"  ->  BitmapAnd  (cost=34256.04..34256.04 rows=161 width=0) (actual time=24660.058..24660.059 rows=0 loops=1)"
"        Buffers: shared hit=169 read=5277"
"        I/O Timings: read=23836.224"
"        ->  Bitmap Index Scan on address_idx  (cost=0.00..135.75 rows=12233 width=0) (actual time=6892.077..6892.077 rows=12973 loops=1)"
"              Index Cond: (Address ~~ '%3365%'::text)"
"              Buffers: shared hit=168 read=321"
"              I/O Timings: read=6815.544"
"        ->  Bitmap Index Scan on longitude_idx  (cost=0.00..34120.04 rows=1627147 width=0) (actual time=17763.265..17763.265 rows=1812752 loops=1)"
"              Index Cond: ((Longitude >= '-90.5'::double precision) AND (Longitude <= '-90'::double precision))"
"              Buffers: shared hit=1 read=4956"
"              I/O Timings: read=17020.681"
"Planning Time: 0.267 ms"
"Execution Time: 32008.085 ms"

请问有什么建议吗?

4

1 回答 1

0

我不知道您的数据库是如何构建的,也不知道您使用的是什么查询,所以在您提供更多上下文之前,请对我的回答持保留态度。

ElastricSearch虽然不完全是数据库,但对您的用例证明是有用的(如果我正确理解您正在处理的问题)。

于 2021-09-10T14:41:52.897 回答