我正在尝试将系统从 Postgres 8.3 迁移到 9.1,每天在 8.3 服务器上运行的查询需要大约 10 分钟并且 <5gb 的 ram,9.1 db 上完全相同的查询占用了所有内存(200GB+)死前的默认表空间(通常在一两个小时后),
查询
INSERT INTO warehouse.stream_user_facts(
time_id,
country,
content_manager_id,
preview,
stream_type_id,
stream_source_key_id,
user_is_registered,
user_id,
gender,
age,
stream_count,
stream_costs,
user_recency_days
)
SELECT
facts.*,
date - last_stream AS user_recency_days
FROM main_facts AS facts
LEFT JOIN warehouse.times USING (time_id)
LEFT JOIN last_streams USING (user_id);
它使用的视图
CREATE TEMP VIEW main_facts AS
SELECT
time_id,
CASE WHEN stream_type_id = (SELECT stream_type_id FROM warehouse.stream_types WHERE name = 'SUBSCRIPTION')
THEN subscription_country
ELSE track_streams_extra.country
END AS report_country,
content_manager_id,
preview,
stream_type_id,
stream_source_key_id,
COALESCE(register_date <= reporting_date, false) AS user_is_registered,
user_id,
CASE WHEN register_date <= reporting_date
THEN gender
END AS gender_at_stream,
CASE WHEN register_date <= reporting_date
THEN EXTRACT(YEAR FROM age(reporting_date, dob))
END AS age_at_stream,
COUNT(1) AS stream_count,
nullable_sum(cost) AS stream_costs
FROM warehouse.track_streams_extra
LEFT JOIN users USING (user_id)
LEFT JOIN user_registration USING (user_id)
JOIN time_period ON (reporting_date >= start AND reporting_date < past_end)
GROUP BY time_id, report_country, content_manager_id, preview, stream_type_id, stream_source_key_id, user_is_registered, user_id, gender_at_stream, age_at_stream;
我看不出它有什么问题,但正如我所说,它适用于 8.3 但不适用于 9.1,是否存在一些可能导致这种情况发生的根本变化。
编辑:: 添加解释 *编辑:: 添加解释版本*
在 9.1 上进行解释,因为它崩溃并且永远不会返回,所以无法得到解释分析,将立即发布 8.3 解释分析
Insert on stream_user_facts (cost=148914916373.07..321324730521.96 rows=1362701321853 width=152)
-> Hash Left Join (cost=148914916373.07..321324730521.96 rows=1362701321853 width=152)
Hash Cond: (track_streams.user_id = last_streams.user_id)
-> Hash Left Join (cost=148914916319.42..246024946085.50 rows=140484672356 width=148)
Hash Cond: (time_period.time_id = times.time_id)
-> GroupAggregate (cost=148914916300.90..242688435098.53 rows=140484672356 width=95)
InitPlan 1 (returns $0)
-> Seq Scan on stream_types (cost=0.00..1.06 rows=1 width=2)
Filter: (name = 'SUBSCRIPTION'::text)
-> Sort (cost=148914916299.84..149266127980.73 rows=140484672356 width=95)
Sort Key: time_period.time_id, (CASE WHEN (CASE WHEN ((track_streams.stream_source_key_id = ANY ('{8,16}'::integer[])) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2012-01-01 00:00:00'::timestamp without time zone)) THEN 1 WHEN ((track_streams.flags & 2::bigint) <> 0) THEN 2 WHEN (track_streams.play_source = ANY ('{5,6,7,8}'::integer[])) THEN 3 WHEN CASE WHEN (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2011-06-01 00:00:00'::timestamp without time zone) THEN (COALESCE(track_streams.playlist_id, 0::bigint) > 0) ELSE (track_streams.playlist_id IS NOT NULL) END THEN 4 ELSE 5 END = $0) THEN (archive.subscriptions.country)::character varying ELSE track_streams.country END), t.user_id, track_streams.preview, (CASE WHEN ((track_streams.stream_source_key_id = ANY ('{8,16}'::integer[])) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2012-01-01 00:00:00'::timestamp without time zone)) THEN 1 WHEN ((track_streams.flags & 2::bigint) <> 0) THEN 2 WHEN (track_streams.play_source = ANY ('{5,6,7,8}'::integer[])) THEN 3 WHEN CASE WHEN (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2011-06-01 00:00:00'::timestamp without time zone) THEN (COALESCE(track_streams.playlist_id, 0::bigint) > 0) ELSE (track_streams.playlist_id IS NOT NULL) END THEN 4 ELSE 5 END), track_streams.stream_source_key_id, (COALESCE((user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END), false)), track_streams.user_id, (CASE WHEN (user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END) THEN users.gender ELSE NULL::character varying END), (CASE WHEN (user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END) THEN date_part('year'::text, age(CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END, (users.dob)::timestamp without time zone)) ELSE NULL::double precision END)
-> Nested Loop (cost=40230598.46..58079790334.65 rows=140484672356 width=95)
Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= time_period.start) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < time_period.past_end))
-> Hash Left Join (cost=40230598.46..481074638.44 rows=775682240 width=87)
Hash Cond: (track_streams.user_id = user_registration.user_id)
-> Hash Left Join (cost=40173344.70..448709880.28 rows=775682240 width=79)
Hash Cond: (track_streams.user_id = users.user_id)
-> Hash Left Join (cost=37218851.64..347178159.42 rows=775682240 width=73)
Hash Cond: ((t.user_id = c.content_manager_id) AND ((track_streams.country)::text = (c.country)::text))
Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= c.first_valid_day) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < c.one_past_last_valid_day))
-> Hash Left Join (cost=37218834.16..279295937.13 rows=775682240 width=67)
Hash Cond: (t.user_id = fallback.content_manager_id)
Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= fallback.first_valid_day) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < fallback.one_past_last_valid_day))
-> Hash Join (cost=37218817.98..155389716.85 rows=775682240 width=61)
Hash Cond: (track_streams.track_id = t.track_id)
-> Hash Right Join (cost=36223425.43..114392317.09 rows=775682240 width=61)
Hash Cond: (archive.subscriptions.user_id = track_streams.user_id)
Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= archive.subscriptions.created_at) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < COALESCE(archive.subscriptions.created_at, 'infinity'::timestamp without time zone)))
-> Merge Left Join (cost=28078.03..28672.47 rows=26353 width=27)
Merge Cond: ((archive.subscriptions.user_id = archive.subscriptions.user_id) AND ((((count(b.subscription_id)) + 1)) = (count(b.subscription_id))))
-> Sort (cost=14110.51..14176.40 rows=26353 width=27)
Sort Key: archive.subscriptions.user_id, (((count(b.subscription_id)) + 1))
-> Hash Join (cost=9851.19..11541.95 rows=26353 width=27)
Hash Cond: (a.subscription_id = archive.subscriptions.subscription_id)
-> GroupAggregate (cost=8419.24..8880.42 rows=26353 width=16)
-> Sort (cost=8419.24..8485.13 rows=26353 width=16)
Sort Key: a.subscription_id
-> Hash Left Join (cost=1405.94..6032.69 rows=26353 width=16)
Hash Cond: (a.user_id = b.user_id)
Join Filter: (b.created_at < a.created_at)
-> Seq Scan on subscriptions a (cost=0.00..921.53 rows=26353 width=24)
-> Hash (cost=921.53..921.53 rows=26353 width=24)
-> Seq Scan on subscriptions b (cost=0.00..921.53 rows=26353 width=24)
-> Hash (cost=921.53..921.53 rows=26353 width=27)
-> Seq Scan on subscriptions (cost=0.00..921.53 rows=26353 width=27)
-> Materialize (cost=13967.51..14099.28 rows=26353 width=24)
-> Sort (cost=13967.51..14033.40 rows=26353 width=24)
Sort Key: archive.subscriptions.user_id, (count(b.subscription_id))
-> Hash Join (cost=9825.19..11489.95 rows=26353 width=24)
Hash Cond: (a.subscription_id = archive.subscriptions.subscription_id)
-> GroupAggregate (cost=8419.24..8880.42 rows=26353 width=16)
-> Sort (cost=8419.24..8485.13 rows=26353 width=16)
Sort Key: a.subscription_id
-> Hash Left Join (cost=1405.94..6032.69 rows=26353 width=16)
Hash Cond: (a.user_id = b.user_id)
Join Filter: (b.created_at < a.created_at)
-> Seq Scan on subscriptions a (cost=0.00..921.53 rows=26353 width=24)
-> Hash (cost=921.53..921.53 rows=26353 width=24)
-> Seq Scan on subscriptions b (cost=0.00..921.53 rows=26353 width=24)
-> Hash (cost=921.53..921.53 rows=26353 width=24)
-> Seq Scan on subscriptions (cost=0.00..921.53 rows=26353 width=24)
-> Hash (cost=18166794.40..18166794.40 rows=775682240 width=58)
-> Seq Scan on track_streams (cost=0.00..18166794.40 rows=775682240 width=58)
-> Hash (cost=758689.47..758689.47 rows=13617047 width=16)
-> Seq Scan on tracks t (cost=0.00..758689.47 rows=13617047 width=16)
-> Hash (cost=9.99..9.99 rows=495 width=30)
-> Seq Scan on streaming_costs fallback (cost=0.00..9.99 rows=495 width=30)
Filter: (country IS NULL)
-> Hash (cost=9.99..9.99 rows=499 width=33)
-> Seq Scan on streaming_costs c (cost=0.00..9.99 rows=499 width=33)
-> Hash (cost=1728633.36..1728633.36 rows=70521336 width=14)
-> Seq Scan on users (cost=0.00..1728633.36 rows=70521336 width=14)
-> Hash (cost=30163.34..30163.34 rows=1558434 width=16)
-> Seq Scan on user_registration (cost=0.00..30163.34 rows=1558434 width=16)
-> Materialize (cost=0.00..34.45 rows=1630 width=20)
-> Seq Scan on time_period (cost=0.00..26.30 rows=1630 width=20)
-> Hash (cost=10.45..10.45 rows=645 width=12)
-> Seq Scan on times (cost=0.00..10.45 rows=645 width=12)
-> Hash (cost=29.40..29.40 rows=1940 width=12)
-> Seq Scan on last_streams (cost=0.00..29.40 rows=1940 width=12)
(80 rows)
编辑配置 8.3
name | current_setting
---------------------------+------------------------------------------------------------------------------------------------
version | PostgreSQL 8.3.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
autovacuum | off
checkpoint_segments | 6
client_encoding | UTF8
constraint_exclusion | on
default_statistics_target | 100
effective_cache_size | 512MB
external_pid_file | /var/run/postgresql/8.3-main.pid
lc_collate | en_GB.UTF-8
lc_ctype | en_GB.UTF-8
listen_addresses | *
log_line_prefix | %t
log_lock_waits | on
maintenance_work_mem | 2GB
max_connections | 50
max_fsm_pages | 3000000
max_stack_depth | 2MB
port | 5432
search_path | "$user", archive, clone, live, public
server_encoding | UTF8
shared_buffers | 1536MB
ssl | on
TimeZone | GB
unix_socket_directory | /var/run/postgresql
wal_buffers | 8MB
work_mem | 512MB
9.1
name | current_setting
------------------------------+-------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
archive_command | test ! -f /storage/wal/data/%f && cp %p /storage/wal/data/%f
archive_mode | on
archive_timeout | 5min
autovacuum_freeze_max_age | 1000000000
autovacuum_vacuum_cost_delay | 20ms
checkpoint_completion_target | 0.5
checkpoint_segments | 64
checkpoint_timeout | 1min
checkpoint_warning | 0
client_encoding | UTF8
effective_cache_size | 18GB
effective_io_concurrency | 6
external_pid_file | /var/run/postgresql/9.1-main.pid
lc_collate | en_GB.UTF8
lc_ctype | en_GB.UTF8
listen_addresses | 127.0.0.1,10.10.10.2,10.10.10.1,10.0.0.225
log_destination | stderr
log_directory | /var/log/postgresql
log_filename | postgresql-%Y-%m-%d.log
log_line_prefix | %m [%u@%r:%d]
log_min_duration_statement | 100ms
log_min_error_statement | info
log_min_messages | info
logging_collector | on
maintenance_work_mem | 2GB
max_connections | 100
max_stack_depth | 2MB
max_wal_senders | 5
port | 5432
search_path | "$user", archive, clone, live
server_encoding | UTF8