0

快速估算尺寸?

我正在研究FILLFACTOR调优,因此试图弄清楚如何计算 Postgres 中的平均行大小。我用这个线程作为起点:

https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row

毫不奇怪,最准确的方法需要很长时间,我想知道是否有一种方法可以快速获得合理准确的估计?而且,就 而言FILLFACTOR,什么是最好的衡量标准?似乎索引和 TOAST 大小没有进入它。

到目前为止我已经尝试过:

  • 基于 Erwin Brandstetter 上面引用的线程中的详细示例的多结果函数,table_get_info在此处命名。缓慢,但详细而准确。

  • AVG(pg_column_size(table_name.*)),也来自那个线程,在这里实现为table_get_row_size_estimate. 慢,但没那么慢。

  • avg(length(table_name::text),TABLESAMPLE在这里实现为table_get_row_length_estimate. 变速...精度取决于样品/运气?

示例查询

我知道这是低效的,这对这个测试很好。只是想得到一些比较结果。

SELECT relname,

    (select bytes_per_row
         from table_get_info ('data',relname)
        where metric = 'core_relation_size'
      ) as core_relation_size,

     (select bytes_per_row
         from table_get_info ('data',relname)
        where metric = 'live_rows_in_text_representation'
      ) as live_rows_in_text,

     (select * from table_get_row_size_estimate('data',relname)
     ) as table_get_row_size,

     (select * from table_get_row_length_estimate('data',relname)
     ) as table_get_row_length

FROM pg_stat_user_tables
WHERE relname IN  (
    'activity',
    'analytic_productivity',
    'analytic_scan',
    'analytic_sterilizer_load',
    'analytic_sterilizer_loadinv',
    'analytic_work',
    'assembly',
    'data_file_info',
    'inv',
    'item',
    'print_job',
    'q_event')

order by 1;

结果

relname          core_relation_size  live_rows_in_text table_get_row_size   table_get_row_length
activity                        199                321                177                    322
analytic_productivity           364                553                329                    554
analytic_scan                   275                401                258                    402
analytic_sterilizer_load        220                379                208                    380
analytic_sterilizer_loadinv     366                603                324                    603
analytic_work                   407                662                359                    662
assembly                        284                466                263                    466
data_file_info               36,864             26,382              7,215                 23,722
inv                             324                486                281                    487
item                            653                966                572                    967
print_job                       223                309                208                    304
q_event                         349                611                320                    612

live_rows_in_text和的结果table_get_row_length非常相似,因为它们做的事情大致相同。这很慢,因为 Postgres 必须测试很多或所有行。估计(最右边的列)使用TABLESAMPLE,但它仍然很慢。

是否有一个快速的替代方案足以进行FILLFACTOR估算?FILLFACTOR并且,如果不是,什么度量对估计最有意义?

我已经包含了接下来使用的每个函数的代码。

table_get_info

提出了调整时要检查什么的问题FILLFACTOR

CREATE OR REPLACE FUNCTION dba.table_get_info(schema_name_in text, table_name_in text)
  RETURNS TABLE (
    metric         text,
    bytes          int8,
    bytes_pretty   text,
    bytes_per_row  int8
)

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

-- Erwin Brandstetter
-- https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-ROW

BEGIN

RAISE NOTICE 'Table: %.%: ', v_schema_name,  v_table_name;

RETURN QUERY EXECUTE

'SELECT l.metric,
        l.nr AS bytes
     , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
     , CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM  (
   SELECT min(tableoid)        AS tbl
        , count(*)             AS ct
        , sum(length(t::text)) AS txt_len  -- length in characters
   FROM   ' ||  v_schema_name || '.' || v_table_name ||  ' t
   ) x
CROSS  JOIN LATERAL (
   VALUES
     (true , ''core_relation_size''               , pg_relation_size(tbl))
   , (true , ''visibility_map''                   , pg_relation_size(tbl, ''vm''))
   , (true , ''free_space_map''                   , pg_relation_size(tbl, ''fsm''))
   , (true , ''table_size_incl_toast''            , pg_table_size(tbl))
   , (true , ''indexes_size''                     , pg_indexes_size(tbl))
   , (true , ''total_size_incl_toast_and_indexes'', pg_total_relation_size(tbl))
   , (true , ''live_rows_in_text_representation'' , txt_len)
   , (false, ''row_count''                        , ct)
   , (false, ''live_tuples''                      , pg_stat_get_live_tuples(tbl))
   , (false, ''dead_tuples''                      , pg_stat_get_dead_tuples(tbl))
   ) l(is_size, metric, nr)'

    USING v_schema_name, v_table_name;

END
$BODY$;

table_get_row_size_estimate

我在这里尝试TABLESAMPLE过,它没有引起错误。虽然没有加快任何速度。

CREATE FUNCTION dba.table_get_row_size_estimate(schema_name_in text, table_name_in text)
  RETURNS int8

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

v_row_size_estimate real := 0;

BEGIN

RAISE NOTICE 'Table: %.%: ', v_schema_name,  v_table_name;

-- SELECT AVG(pg_column_size(table_name.*)) FROM tablename; –
EXECUTE
  FORMAT ('SELECT AVG(pg_column_size(' || v_schema_name || '.' || v_table_name || '.*)) FROM ' || v_table_name || ';')
  USING v_schema_name,  v_table_name
  INTO v_row_size_estimate;
RETURN v_row_size_estimate;

END
$BODY$;

table_get_row_length_estimate

我正在尝试这样做以访问TABLESAMPLE. 我认为 8% 是合理估计的一个很好的默认值。

DROP FUNCTION IF EXISTS dba.table_get_row_length_estimate(text, text, int);

CREATE FUNCTION dba.table_get_row_length_estimate(
     schema_name_in       text,
     table_name_in        text,
     sample_percentage_in int default 8)

  RETURNS int8

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

v_row_length_estimate real := 0;

BEGIN

RAISE NOTICE 'Inputs: %.%: (%) ', v_schema_name,  v_table_name, sample_percentage_in;

/*
select avg(length(activity::text)) from data.activity tablesample bernoulli(8)
*/

EXECUTE FORMAT (
  '    select avg(length(' || v_table_name || '::text))
       from ' || v_table_name || '
    tablesample bernoulli(' || sample_percentage_in || ');')
  USING v_schema_name,  v_table_name, sample_percentage_in
  INTO v_row_length_estimate;

RETURN v_row_length_estimate;

END
$BODY$;
4

0 回答 0