我在使用 BerkeleyDB 时遇到了一些问题。我有多个相同代码的实例指向一个数据库文件存储库,一切正常运行 5-32 小时,然后突然出现死锁。命令提示在执行 db_get 或 db_put 或游标创建调用之前停止。所以我只是要求以正确的方式处理这些电话。这是我的总体布局:
这是创建环境和数据库的方式:
my $env = new BerkeleyDB::Env (
-Home => "$dbFolder\\" ,
-Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL)
or die "cannot open environment: $BerkeleyDB::Error\n";
my $unsortedHash = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/Unsorted.db",
-Flags => DB_CREATE,
-Env => $env
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
此代码的单个实例运行,转到一个站点并保存 URL 以供另一个实例解析(我设置了标志,以便每个数据库在锁定时都被锁定):
$lk = $unsortedHash->cds_lock();
while(@urlsToAdd){
my $currUrl = shift @urlsToAdd;
$unsortedHash->db_put($currUrl, '0');
}
$lk->cds_unlock();
它会定期检查是否有一定数量的项目处于未排序状态:
$refer = $unsortedHash->db_stat();
$elements = $refer->{'hash_ndata'};
在将任何元素添加到任何数据库之前,它首先检查所有数据库以查看该元素是否已经存在:
if ($unsortedHash->db_get($search, $value) == 0){
$value = "1:$value";
}elsif ($badHash->db_get($search, $value) == 0){
$value = "2:$value";
....
下一个代码紧随其后,它的许多实例并行运行。首先,它获取未排序的下一项(没有忙碌值“1”),然后将值设置为忙碌“1”,然后对其进行处理,然后将数据库条目完全移动到另一个数据库(它是从未排序中删除并存储在另一个数据库中):
my $pageUrl = '';
my $busy = '1';
my $curs;
my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock
########## GET AN ELEMENT FROM THE UNSORTED HASH #######
while(1){
$busy = '1';
$curs = $unsortedHash->db_cursor();
while ($busy){
$curs->c_get($pageUrl, $busy, DB_NEXT);
print "$pageUrl:$busy:\n";
if ($pageUrl eq ''){
$busy = 0;
}
}
$curs->c_close();
$curs = undef;
if ($pageUrl eq ''){
print "Database empty. Sleeping...\n";
$lk->cds_unlock();
sleep(30);
$lk = $unsortedHash->cds_lock();
}else{
last;
}
}
####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT
$unsortedHash->db_put($pageUrl, '1');
$lk->cds_unlock();
$lk = undef;
在其他任何地方,如果我在任何数据库上调用 db_put 或 db_del,它都会被这样的锁包裹:
print "\n\nBad.\n\n";
$lk = $badHash->cds_lock();
$badHash->db_put($pageUrl, '0');
$unsortedHash->db_del($pageUrl);
$lk->cds_unlock();
$lk = undef;
但是,我的 db_get 命令是自由浮动的,没有锁定,因为我认为阅读不需要锁定。
我已经查看了这个代码一百万次,算法是无懈可击的。所以我只是想知道我是否正在实施这个错误的任何部分,使用错误的锁等等。或者是否有更好的方法来防止使用 BerkeleyDB 和 Strawberry Perl 发生死锁(甚至诊断死锁)?
更新:更具体地说,问题发生在 Windows 2003 服务器上(1.5 GB RAM,不确定这是否重要)。我可以在我的 Windows 7 机器(4GB RAM)上运行整个设置。我还开始使用以下命令打印出锁定统计信息:
将此标志添加到环境创建中:
-MsgFile => "$dbFolder/lockData.txt"
然后每 60 秒调用一次:
my $status = $env->lock_stat_print();
print "Status:$status:\n";
状态始终返回为 0,即成功。这是最后的统计报告:
29 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
5 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
40 Number of lock object partitions
24 Number of current locks
42 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
29 Number of current lockers
29 Maximum number of lockers at any one time
6 Number of current lock objects
13 Maximum number of lock objects at any one time
1 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
3121958 Total number of locks requested
3121926 Total number of locks released
0 Total number of locks upgraded
24 Total number of locks downgraded
9310 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
8 Number of deadlocks
1000000 Lock timeout value
0 Number of locks that have timed out
1000000 Transaction timeout value
0 Number of transactions that have timed out
792KB The size of the lock region
59 The number of partition locks that required waiting (0%)
46 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
27 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
1 Maximum hash bucket length
我对此持谨慎态度:
8 Number of deadlocks
这些死锁是如何发生的,又是如何解决的?(代码的所有部分仍在运行)。在这种情况下,究竟什么是死锁?