sql - 使用 perl dbi 处理长时间运行的大型事务

Question

我有一个大事务，包括从数据库 A 获取大量数据，对这些数据进行一些操作，然后将操作后的数据插入数据库 B。我只有在数据库 A 中选择的权限，但我可以创建表和在数据库 B 中插入/更新等。

操作和插入部分是用 perl 编写的，并且已经用于从其他数据源将数据加载到数据库 B 中，因此所需的只是从数据库 A 获取必要的数据并使用它来初始化 perl 类。

如果在操作或插入过程中发生任何错误（数据库断开连接、由于无效值导致的类初始化问题、硬盘故障等），我该如何着手执行此操作，以便我可以轻松地追溯并从错误发生的位置找到错误发生的位置。 .)？一次性完成交易似乎不是一个好的选择，因为来自数据库 A 的大量数据意味着数据操作和插入数据库 B 至少需要一两天的时间。

来自数据库 A 的数据可以使用唯一键分为大约 1000 个组，每个键包含 1000 行。我认为我可以做的一种方法是编写一个每个组执行提交的脚本，这意味着我必须跟踪哪个组已经插入到数据库 B 中。我能想到的唯一方法是跟踪哪些组的进度是否被处理在日志文件或数据库 B 的表中。我认为可行的第二种方法是将加载类以进行操作和插入到平面文件中所需的所有必要字段转储，读取文件以进行初始化类并插入数据库 B。这也意味着我必须进行一些日志记录，但如果发生任何错误，应将其缩小到平面文件中的确切行。该脚本将如下所示：

use strict;
use warnings;
use DBI;

#connect to database A
my $dbh = DBI->connect('dbi:oracle:my_db', $user, $password, { RaiseError => 1, AutoCommit => 0 });

#statement to get data based on group unique key
my $sth = $dbh->prepare($my_sql);

my @groups; #I have a list of this already

open my $fh, '>>', 'my_logfile' or die "can't open logfile $!";

eval {
    foreach my $g (@groups){
        #subroutine to check if group has already been processed, either from log file or from database table
        next if is_processed($g);

        $sth->execute($g);
        my $data = $sth->fetchall_arrayref;

        #manipulate $data, then use it to load perl classes for insertion into database B
        #.
        #.
        #.
    }
    print $fh "$g\n";
};
if ($@){
   $dbh->rollback;
   die "something wrong...rollback";
}

So if any errors do occur, I can just run this script again and it should skip the groups or rows that have been processed and continue.

Both these methods is just variations on the same theme, and both require going back to where I've been tracking my progress (in table or file), skip the ones that've been commited to database B and process the remaining data.

I'm sure there's a better way of doing this but am struggling to think of other solutions. Is there another way of handling large transactions between databases that require data manipulation between getting data out from one and inserting into another? The process doesn't need to be all in Perl, as long as I can reuse the perl classes for manipulating and inserting the data into the database.

score 2 · Accepted Answer

很抱歉这么说，但我真的不明白你怎么可能通过走捷径来解决这个问题。对我来说，听起来你已经考虑了最合理的方法：

在每个步骤中将状态保存在某个临时表/文件中（我会查看“perldoc -f tie”或 sqlite）
正确处理错误 TryCatch.pm、eval 或任何你喜欢的
正确记录您的错误，即您可以阅读的结构化日志
将一些“恢复”标志添加到您的脚本中，该标志会读取以前的日志和数据并重试

这可能与您一直在考虑的思路一致，但正如我所说，我认为没有一种通用的“正确”方法来处理您的问题。

sql - 使用 perl dbi 处理长时间运行的大型事务

1 回答 1

Related

Reference