0

我的硕士论文的任务是证明时间尺度和 postgis 的结合将提高 OSM 数据的 PostgreSQL 数据库性能。我已经准备了包含欧洲 OSM 数据的数据集(具有 100M 行的 CSV 文件)。当我开始在经典 postgresql 数据库中复制该数据时,摄取速率约为 200k 行/秒。当我在时间尺度超表中复制它时,摄取率低于 100k 行/秒。这个结果是出乎意料的,我的问题是为什么会这样?我需要设置什么吗?也许问题是从 2006 年到 2019 年 osm 时间戳的不均匀性。

当我将它保存在经典的 postgresql 表中时:

   osm_timestamp    |                        way                         
---------------------+----------------------------------------------------
 2019-08-20 02:22:35 | 0101000020110F0000F0076BFEFFB162C14485197AF1B65341
 2019-08-05 15:46:38 | 0101000020110F00002BFC9A016E864AC17DB392F223375241
 2019-08-05 15:46:38 | 0101000020110F0000142668FD5A804AC14841650D62375241
 2014-04-22 19:36:43 | 0101000020110F0000A265A7382E7F4AC113BDE36F99375241
 2014-04-22 19:36:43 | 0101000020110F0000C91A02369D7E4AC1D7D24B7197375241
 2018-04-21 21:08:35 | 0101000020110F00003FCDEEF0747E4AC151E880038E375241
 2014-04-22 19:36:43 | 0101000020110F0000C3186511957E4AC19620025B92375241
 2017-12-10 17:43:50 | 0101000020110F0000B24BD8C58E7E4AC153B6CA5192375241
 2014-04-22 19:36:43 | 0101000020110F000014D08064937E4AC1C131DECE95375241
 2017-08-25 12:30:33 | 0101000020110F0000249BF33F977E4AC14AA0211597375241
 2014-04-22 19:36:43 | 0101000020110F0000EC629803907E4AC1DAC3FF3098375241
 2018-04-21 21:08:36 | 0101000020110F000043C2E8A5787E4AC18A7F52A18F375241

当我将它保存在 timescaledb 表中时:

   osm_timestamp    |                        way                         
---------------------+----------------------------------------------------
 2019-08-20 02:22:35 | 0101000020110F0000F0076BFEFFB162C14485197AF1B65341
 2019-08-19 19:25:36 | 0101000020110F0000BA461AE38D7548C159769C60C3C75141
 2019-08-19 19:25:36 | 0101000020110F0000D8062171F57148C1081AC67C7BC65141
 2019-08-19 19:25:36 | 0101000020110F00000A3CD250F37148C13CB433AB7AC65141
 2019-08-19 19:25:36 | 0101000020110F0000E6C794D0F27148C1E4B157257CC65141
 2019-08-19 19:25:36 | 0101000020110F0000EB32A406717048C1D6F39FB772C65141
 2019-08-19 16:32:34 | 0101000020110F000066CAFEFFEE6048C18DD0C86240C15141
 2019-08-19 16:32:34 | 0101000020110F000058C74E3ADA6048C1244D22AC63C15141
 2019-08-19 16:32:34 | 0101000020110F00004ABED3D8C36048C14FEF45345FC15141
 2019-08-19 10:45:35 | 0101000020110F00005FBA75B7DE5E48C1FB21EF296DC15141
 2019-08-19 19:25:36 | 0101000020110F00000DF0FD868B7948C1EEA03CEE28C95141
 2019-08-19 19:25:36 | 0101000020110F000092EF4F0EE87548C1F7598342B4CB5141
 2019-08-19 19:25:36 | 0101000020110F0000B75DC2F2E67548C1C06DA855B4CB5141
 2019-08-20 18:41:46 | 0101000020110F0000E674D391CC5148C168E4DE3147C25141
 2019-08-20 18:02:29 | 0101000020110F0000FCE227F30C5148C1164B566039C25141
 2019-08-20 18:41:46 | 0101000020110F00007FA03258515148C1C88FDDB08AC25141
 2019-08-20 18:41:46 | 0101000020110F000094A2CFC1165148C15EA45CCAAAC25141
 2019-08-20 18:41:46 | 0101000020110F00004720D019315148C17DEEAD09B3C25141

保存在经典 postgresql 中的性能:

Stipe@Mile:~/go/bin$ ./timescaledb-parallel-copy --connection "host=localhost user=postgres sslmode=disable password=postgresifra54" --db-name timescale2 --table timescale2 --batch-size 10000 --truncate --log-batches --file /home/Stipe/DISKC/europe-point.csv | tee /home/Stipe/DISKC/postgis.txt
[BATCH] took 43.292909ms, batch size 10000, row rate 230984.709297/sec
[BATCH] took 35.496966ms, batch size 10000, row rate 281714.217491/sec
[BATCH] took 37.104837ms, batch size 10000, row rate 269506.641412/sec
[BATCH] took 36.998932ms, batch size 10000, row rate 270278.071810/sec
[BATCH] took 39.105424ms, batch size 10000, row rate 255719.002049/sec
[BATCH] took 38.659405ms, batch size 10000, row rate 258669.268190/sec
[BATCH] took 35.184652ms, batch size 10000, row rate 284214.833218/sec
[BATCH] took 40.266376ms, batch size 10000, row rate 248346.163558/sec
[BATCH] took 36.179696ms, batch size 10000, row rate 276398.121200/sec

节省时间尺度超表的性能:

Stipe@Mile:~/go/bin$ ./timescaledb-parallel-copy --connection "host=localhost user=postgres sslmode=disable password=postgresifra54" --db-name timescale --table timescale2 --batch-size 10000 --truncate --log-batches --file /home/Stipe/DISKC/europe-point.csv | tee /home/Stipe/DISKC/postgis.txt
[BATCH] took 6.979696947s, batch size 10000, row rate 1432.726962/sec
[BATCH] took 1.439723348s, batch size 10000, row rate 6945.778864/sec
[BATCH] took 1.27673852s, batch size 10000, row rate 7832.457346/sec
[BATCH] took 619.745584ms, batch size 10000, row rate 16135.653497/sec
[BATCH] took 378.107768ms, batch size 10000, row rate 26447.486263/sec
[BATCH] took 350.852359ms, batch size 10000, row rate 28502.017283/sec
[BATCH] took 194.37932ms, batch size 10000, row rate 51445.801951/sec
[BATCH] took 269.47735ms, batch size 10000, row rate 37108.870189/sec
[BATCH] took 206.672165ms, batch size 10000, row rate 48385.809478/sec
[BATCH] took 232.124194ms, batch size 10000, row rate 43080.386528/sec
[BATCH] took 169.58852ms, batch size 10000, row rate 58966.255499/sec
[BATCH] took 350.809657ms, batch size 10000, row rate 28505.486666/sec
[BATCH] took 117.911529ms, batch size 10000, row rate 84809.348881/sec
[BATCH] took 172.228338ms, batch size 10000, row rate 58062.454275/sec
[BATCH] took 121.701297ms, batch size 10000, row rate 82168.392996/sec
[BATCH] took 173.654201ms, batch size 10000, row rate 57585.707356/sec
[BATCH] took 154.958872ms, batch size 10000, row rate 64533.252410/sec
[BATCH] took 111.999767ms, batch size 10000, row rate 89285.900032/sec
[BATCH] took 176.024805ms, batch size 10000, row rate 56810.175134/sec
[BATCH] took 143.048944ms, batch size 10000, row rate 69906.143453/sec
4

1 回答 1

0

请参阅以下链接以提高我们的摄取性能。此外,检查您的基本 postgresql 配置,特别是内存配置。

https://blog.timescale.com/blog/13-tips-to-improve-postgresql-insert-performance/

于 2021-10-13T06:38:35.560 回答