快速估算尺寸?
我正在研究FILLFACTOR
调优,因此试图弄清楚如何计算 Postgres 中的平均行大小。我用这个线程作为起点:
https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row
毫不奇怪,最准确的方法需要很长时间,我想知道是否有一种方法可以快速获得合理准确的估计?而且,就 而言FILLFACTOR
,什么是最好的衡量标准?似乎索引和 TOAST 大小没有进入它。
到目前为止我已经尝试过:
基于 Erwin Brandstetter 上面引用的线程中的详细示例的多结果函数,
table_get_info
在此处命名。缓慢,但详细而准确。AVG(pg_column_size(table_name.*))
,也来自那个线程,在这里实现为table_get_row_size_estimate
. 慢,但没那么慢。avg(length(table_name::text)
,TABLESAMPLE
在这里实现为table_get_row_length_estimate
. 变速...精度取决于样品/运气?
示例查询
我知道这是低效的,这对这个测试很好。只是想得到一些比较结果。
SELECT relname,
(select bytes_per_row
from table_get_info ('data',relname)
where metric = 'core_relation_size'
) as core_relation_size,
(select bytes_per_row
from table_get_info ('data',relname)
where metric = 'live_rows_in_text_representation'
) as live_rows_in_text,
(select * from table_get_row_size_estimate('data',relname)
) as table_get_row_size,
(select * from table_get_row_length_estimate('data',relname)
) as table_get_row_length
FROM pg_stat_user_tables
WHERE relname IN (
'activity',
'analytic_productivity',
'analytic_scan',
'analytic_sterilizer_load',
'analytic_sterilizer_loadinv',
'analytic_work',
'assembly',
'data_file_info',
'inv',
'item',
'print_job',
'q_event')
order by 1;
结果
relname core_relation_size live_rows_in_text table_get_row_size table_get_row_length
activity 199 321 177 322
analytic_productivity 364 553 329 554
analytic_scan 275 401 258 402
analytic_sterilizer_load 220 379 208 380
analytic_sterilizer_loadinv 366 603 324 603
analytic_work 407 662 359 662
assembly 284 466 263 466
data_file_info 36,864 26,382 7,215 23,722
inv 324 486 281 487
item 653 966 572 967
print_job 223 309 208 304
q_event 349 611 320 612
live_rows_in_text
和的结果table_get_row_length
非常相似,因为它们做的事情大致相同。这很慢,因为 Postgres 必须测试很多或所有行。估计(最右边的列)使用TABLESAMPLE
,但它仍然很慢。
是否有一个快速的替代方案足以进行FILLFACTOR
估算?FILLFACTOR
并且,如果不是,什么度量对估计最有意义?
我已经包含了接下来使用的每个函数的代码。
table_get_info
提出了调整时要检查什么的问题FILLFACTOR
。
CREATE OR REPLACE FUNCTION dba.table_get_info(schema_name_in text, table_name_in text)
RETURNS TABLE (
metric text,
bytes int8,
bytes_pretty text,
bytes_per_row int8
)
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
-- Erwin Brandstetter
-- https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-ROW
BEGIN
RAISE NOTICE 'Table: %.%: ', v_schema_name, v_table_name;
RETURN QUERY EXECUTE
'SELECT l.metric,
l.nr AS bytes
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM ' || v_schema_name || '.' || v_table_name || ' t
) x
CROSS JOIN LATERAL (
VALUES
(true , ''core_relation_size'' , pg_relation_size(tbl))
, (true , ''visibility_map'' , pg_relation_size(tbl, ''vm''))
, (true , ''free_space_map'' , pg_relation_size(tbl, ''fsm''))
, (true , ''table_size_incl_toast'' , pg_table_size(tbl))
, (true , ''indexes_size'' , pg_indexes_size(tbl))
, (true , ''total_size_incl_toast_and_indexes'', pg_total_relation_size(tbl))
, (true , ''live_rows_in_text_representation'' , txt_len)
, (false, ''row_count'' , ct)
, (false, ''live_tuples'' , pg_stat_get_live_tuples(tbl))
, (false, ''dead_tuples'' , pg_stat_get_dead_tuples(tbl))
) l(is_size, metric, nr)'
USING v_schema_name, v_table_name;
END
$BODY$;
table_get_row_size_estimate
我在这里尝试TABLESAMPLE
过,它没有引起错误。虽然没有加快任何速度。
CREATE FUNCTION dba.table_get_row_size_estimate(schema_name_in text, table_name_in text)
RETURNS int8
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
v_row_size_estimate real := 0;
BEGIN
RAISE NOTICE 'Table: %.%: ', v_schema_name, v_table_name;
-- SELECT AVG(pg_column_size(table_name.*)) FROM tablename; –
EXECUTE
FORMAT ('SELECT AVG(pg_column_size(' || v_schema_name || '.' || v_table_name || '.*)) FROM ' || v_table_name || ';')
USING v_schema_name, v_table_name
INTO v_row_size_estimate;
RETURN v_row_size_estimate;
END
$BODY$;
table_get_row_length_estimate
我正在尝试这样做以访问TABLESAMPLE
. 我认为 8% 是合理估计的一个很好的默认值。
DROP FUNCTION IF EXISTS dba.table_get_row_length_estimate(text, text, int);
CREATE FUNCTION dba.table_get_row_length_estimate(
schema_name_in text,
table_name_in text,
sample_percentage_in int default 8)
RETURNS int8
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
v_row_length_estimate real := 0;
BEGIN
RAISE NOTICE 'Inputs: %.%: (%) ', v_schema_name, v_table_name, sample_percentage_in;
/*
select avg(length(activity::text)) from data.activity tablesample bernoulli(8)
*/
EXECUTE FORMAT (
' select avg(length(' || v_table_name || '::text))
from ' || v_table_name || '
tablesample bernoulli(' || sample_percentage_in || ');')
USING v_schema_name, v_table_name, sample_percentage_in
INTO v_row_length_estimate;
RETURN v_row_length_estimate;
END
$BODY$;