5

我需要开发一个服务器应用程序(在 C# 中),它将从一个简单的表(在 SQL Server 2005 或 2008 中)读取行,做一些工作,例如调用 Web 服务,然后使用结果状态更新行(成功, 错误)。

看起来很简单,但是当我添加以下应用程序要求时,事情变得更加困难:

  • 出于负载平衡和容错的目的,多个应用程序实例必须同时运行。通常,应用程序将部署在两台或多台服务器上,并将同时访问同一个数据库表。每个表行只能处理一次,因此必须在多个应用程序实例之间使用通用的同步/锁定机制。

  • 当一个应用程序实例正在处理一组行时,其他应用程序实例不必等待它结束来读取另一组等待处理的行。

  • 如果应用程序实例崩溃,则不需要对正在处理的表行进行任何人工干预(例如删除用于对崩溃实例正在处理的行进行应用程序锁定的临时状态)。

  • 应该以类似队列的方式处理行,即应该首先处理最旧的行。

尽管这些要求看起来并不太复杂,但我在想出解决方案时遇到了一些麻烦。

我已经看到了锁定提示建议,例如XLOCK, UPDLOCK, ROWLOCK,READPAST等,但是我没有看到任何锁定提示的组合可以让我实现这些要求。

谢谢你的帮助。

问候,

努诺·格雷罗

4

3 回答 3

5

这是典型的表作为队列模式,如将表用作队列中所述。您将使用 Pending Queue 并且出队事务还应该在合理的超时时间内安排重试。在网络调用期间保持锁定实际上是不可能的。成功后,您将删除待处理的项目。

您还需要能够批量出队,如果您进入严重负载(每秒 100 和数千次操作),则逐个出队太慢。因此,从链接的文章中获取 Pending Queue 示例:

create table PendingQueue (
  id int not null,
  DueTime datetime not null,
  Payload varbinary(max),
  cnstraint pk_pending_id nonclustered primary key(id));

create clustered index cdxPendingQueue on PendingQueue (DueTime);
go

create procedure usp_enqueuePending
  @dueTime datetime,
  @payload varbinary(max)
as
  set nocount on;
  insert into PendingQueue (DueTime, Payload)
    values (@dueTime, @payload);
go

create procedure usp_dequeuePending
  @batchsize int = 100,
  @retryseconds int = 600
as
  set nocount on;
  declare @now datetime;
  set @now = getutcdate();
  with cte as (
    select top(@batchsize) 
      id,
      DueTime,
      Payload
    from PendingQueue with (rowlock, readpast)
    where DueTime < @now
    order by DueTime)
  update cte
    set DueTime = dateadd(seconds, @retryseconds, DueTime)
    output deleted.Payload, deleted.id;
go

成功处理后,您将使用 ID 从队列中删除项目。在失败或崩溃时,它将在 10 分钟内自动重试。一种认为您必须内化的是,只要 HTTP 不提供事务语义,您就永远无法使用 100% 一致的语义来做到这一点(例如,保证没有项目被处理两次)。可以实现非常高的容错率,但是在数据库更新之前,HTTP调用成功后系统总是会崩溃,并且会导致重试相同的项目,因为你无法区分这种情况和系统在HTTP 调用之前崩溃的情况。

于 2012-07-04T20:12:02.407 回答
4

I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.

What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.

Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).

SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN

UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
   SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
   WHERE StatusID = 1 -- ready
   ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;

-- Do work in application, holding open the tran.

DELETE dbo.QueueTable WHERE QueueID = @QueueID; -- value taken from recordset that was output earlier

COMMIT TRAN;

In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.

What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.

If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.

Note: The following links should be edifying:

于 2012-07-04T18:08:19.767 回答
2

你不能用 SQL 事务来做到这一点(或者在这里依赖事务作为你的主要组件)。实际上,你可以这样做,但你不应该这样做。事务不应该以这种方式使用,用于长锁,你不应该像这样滥用它们。

长时间保持事务打开(检索行、调用 Web 服务、返回进行一些更新)根本不好。而且没有optimistic locking隔离级别可以让你做你想做的事。

使用 ROWLOCK 也不是一个好主意,因为它就是这样。一个提示。它受到锁升级的影响,并且可以转换为表锁。

我可以建议您的数据库的单个入口点吗?我认为它适合 pub/sub 设计。所以只有一个组件可以读取/更新这些记录:

  1. 读取批量消息(足以让所有其他实例使用)- 1000、10000,无论您认为合适。它通过某种排队方式使这些批次可用于其他(并发)组件。我不会说 MSMQ :) (这将是我今天第二次推荐它,但它也非常适合你的情况)。
  2. 它将消息标记为in progress或类似的东西。
  3. 您的消费者都以事务方式绑定到入站队列并做他们的事情。
  4. 准备好后,在 Web 服务调用之后,他们将消息放入出站队列中。
  5. 中央组件获取它们,并在分布式事务中对数据库进行更新(如果失败,消息将保留在队列中)。由于它是唯一可以执行该操作的操作,因此您不会遇到任何并发问题。至少不在数据库上。
  6. 同时它可以读取下一个待处理的批次,依此类推。
于 2012-07-04T18:00:21.160 回答