4

我正在尝试在berkeleydb-JE中插入 ~ 56,249,000 个项目。我运行DbCacheSize以获取有关我的数据库的一些统计信息:

java -jar je-5.0.34.jar  DbCacheSize -records 56248699 -key 8 -data 20 

=== Environment Cache Overhead ===

3,155,957 minimum bytes

To account for JE daemon operation and record locks,
a significantly larger amount is needed in practice.

=== Database Cache Size ===

 Minimum Bytes    Maximum Bytes   Description
---------------  ---------------  -----------
  1,287,110,736    1,614,375,504  Internal nodes only
  4,330,861,264    4,658,126,032  Internal nodes and leaf nodes

=== Internal Node Usage by Btree Level ===

 Minimum Bytes    Maximum Bytes      Nodes    Level
---------------  ---------------  ----------  -----
  1,269,072,064    1,592,660,160     632,008    1
     17,837,712       21,473,424       7,101    2
        198,448          238,896          79    3
          2,512            3,024           1    4

2 年前我问过这个问题Optimizing a BerkeleyDB JE Database但我仍然不确定我应该如何根据这些统计数据配置我的环境?

加载数据时,我将是唯一有权访问数据库的用户:我应该使用事务吗?

我的环境当前打开如下:

EnvironmentConfig cfg=(...)
cfg.setTransactional(true);
cfg.setAllowCreate(true);
cfg.setReadOnly(false);
cfg.setCachePercent(80);
cfg.setConfigParam(EnvironmentConfig.LOG_FILE_MAX,"250000000");

数据库:

cfg.setAllowCreate(true);
cfg.setTransactional(true);
cfg.setReadOnly(false);

我通过以下方式读取/插入项目:

Transaction txn= env.beginTransaction(null, null);
//open db with transaction 'txn'
Database db=env.open(...txn)

Transaction txn2=this.getEnvironment().beginTransaction(null, null);
long record_id=0L;
while((item=readNextItem(input))!=null)
    {
    (...)
    ++record_id;

    db.put(...); //insert record_id/item into db
    /** every 100000 items commit and create a new transaction.
       I found it was the only way to avoid an outOfMemory exception */
    if(record_id%100000==0)
        {
        txn2.commit();
        System.gc();
        txn2=this.getEnvironment().beginTransaction(null, null);
        }
    }

txn2.commit();
txn.commit();

但事情变得越来越慢。我从 Eclipse 运行程序而没有为 JVM 设置任何东西。

100000 / 56248699 ( 0.2 %).  13694.9 records/seconds.  Time remaining:68.3 m Disk Usage: 23.4 Mb. Expect Disk Usage: 12.8 Gb Free Memory : 318.5 Mb.
200000 / 56248699 ( 0.4 %).  16680.6 records/seconds.  Time remaining:56.0 m Disk Usage: 49.5 Mb. Expect Disk Usage: 13.6 Gb Free Memory : 338.3 Mb.
(...)
6600000 / 56248699 (11.7 %).  9658.2 records/seconds.  Time remaining:85.7 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.6 Gb Free Memory : 165.0 Mb.
6700000 / 56248699 (11.9 %).  9474.5 records/seconds.  Time remaining:87.2 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.7 Gb Free Memory : 164.8 Mb.
6800000 / 56248699 (12.1 %).  9322.6 records/seconds.  Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb.
(Ctrl-C... abort...)

我怎样才能让事情变得更快?

更新:

MemTotal:        4021708 kB
MemFree:          253580 kB
Buffers:           89360 kB
Cached:          1389272 kB
SwapCached:           56 kB
Active:          2228712 kB
Inactive:        1449096 kB
Active(anon):    1793592 kB
Inactive(anon):   596852 kB
Active(file):     435120 kB
Inactive(file):   852244 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3174028 kB
HighFree:          57412 kB
LowTotal:         847680 kB
LowFree:          196168 kB
SwapTotal:       4085756 kB
SwapFree:        4068224 kB
Dirty:             16320 kB
Writeback:             0 kB
AnonPages:       2199056 kB
Mapped:           111280 kB
Shmem:            191272 kB
Slab:              58664 kB
SReclaimable:      41448 kB
SUnreclaim:        17216 kB
KernelStack:        3792 kB
PageTables:        11328 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6096608 kB
Committed_AS:    5069728 kB
VmallocTotal:     122880 kB
VmallocUsed:       18476 kB
VmallocChunk:      81572 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10232 kB
DirectMap2M:      903168 kB

更新2:

Max. Heap Size (Estimated): 872.94M
Ergonomics Machine Class: server
Using VM: Java HotSpot(TM) Server VM

更新 3:

使用 Jerven 的建议,我得到以下表现:

    (...)
    6800000 / 56248699 (12.1 %).  13144.8 records/seconds.  Time remaining:62.7 m Disk Usage: 1.8 Gb. Expect Disk Usage: 14.6 Gb Free Memory : 95.5 Mb.
    (...)

与我之前的结果相比:

6800000 / 56248699 (12.1 %).  9322.6 records/seconds.  Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb.
4

1 回答 1

3

首先,我会删除您对 System.gc(); 的显式调用;如果您注意到这有助于提高性能,请考虑使用不同的 GC 算法。例如,当 bdb/je 缓存使用率始终接近可用堆的 70% 时,G1GC 的性能会更好。

其次,在某些时候,B+ 索引更新将是 n log n 性能,并将减少插入时间。

不使用事务会更快。特别是,如果您可以在导入失败时从头开始重新启动导入。

只要记住在最后做一个 environment.sync() 和一个检查点。在执行此导入时,您可能希望禁用 BDB/je 检查点和 BDB/je GC 线程。

config.setConfigParam(EnvironmentConfig.ENV_RUN_CLEANER,  "false");
config.setConfigParam(EnvironmentConfig.ENV_RUN_CHECKPOINTER, "false);
config.setConfigParam(EnvironmentConfig.ENV_RUN_IN_COMPRESSOR, "false");

加载后,您应该调用这样的方法。

public void checkpointAndSync()
    throws ObjectStoreException
{
            env.sync();
    CheckpointConfig force = new CheckpointConfig();
    force.setForce(true);
    try
    {
        env.checkpoint(force);
    } catch (DatabaseException e)
    {
        log.error("Can not chekpoint db " + path.getAbsolutePath(), e);
        throw new ObjectStoreException(e);
    }
}

You might consider, turning on keyprefixing as well.

For the rest your internal node cache size should be at least 1.6 GB, which means a larger than 2GB heap to start with.

You can also consider merging records. For example if your keys naturally increment you can store 16 values under one key. But if you think this an interesting approach you might start with increasing the B tree fanout setting.

于 2013-02-25T13:24:10.240 回答