0

语境

我们有一些应用程序读取和写入 SQL Server。这些应用程序在启动时从中央 Hashicorp 保管库获取其凭据,该保管库在应用程序启动时创建凭据。

问题

通常(每周 3-5 次)我们会遇到死锁,而且罪魁祸首总是相同的——一些无害的应用程序正在运行一个查询(甚至是一个简单的单表选择语句),并且 Hashicorp 正在运行 ALTER ROLE 语句来添加一些dbwriter/reader 角色的另一个应用程序的新凭据。

角色创建 SQL 如下所示:

USE MASTER;

CREATE LOGIN [{{name}}] WITH PASSWORD = N'{{password}}';

USE SomeDb;

CREATE USER [{{name}}] FOR LOGIN [{{name}}];

EXEC sp_addrolemember db_datareader, [{{name}}];
EXEC sp_addrolemember db_datawriter, [{{name}}];

问题似乎出现在由ALTER ROLE拨打的电话中sp_addrolemember

似乎该ALTER ROLE语句获得了Sch-m(架构修改)锁定PERMISSIONS,然后DATABASE_PRINCIPAL。任何查询(例如 a SELECT)都需要Sch-S在这些上获取 Schema Stability ( ) 锁,这可能导致死锁(例如,查询先锁定DATABASE_PRINCIPAL,先ALTER锁定PERMISSIONS)。

我调查的内容:

  • 我非常沮丧地无法针对开发 DBMS 重新创建它 - 我尝试运行 python 脚本以在查询数据时不断创建凭据。

  • 我找不到任何关于如何预先获取这些锁的文档(例如,如果用户创建代码同时获取了两个锁/等待它们都被释放)

所以我会喜欢关于这个的任何想法(或者为什么它可能不可重现!)。

已经提出的一种解决方案是切换到 GRANT 而不是使用角色,显然这可能不需要模式修改锁。但是,我不确定这是否真的如此,并且我不想在没有保证它们会改善情况的情况下将这些更改投入生产。

以下是来自 ignite 查看器的数据:

幸存者查询:

/* InputBuf */
EXEC sp_addrolemember db_datareader, [v-rcon_approle-svc_****_rw-WOqDPce4L742J1mpoMfM-1639090813]

/* Frame 1  procname=adhoc, line=1 */
alter role [db_datareader] add member [v-rcon_approle-svc_****_rw-WOqDPce4L742J1mpoMfM-1639090813]

/* Frame 2  procname=mssqlsystemresource.sys.sp_addrolemember, line=47 */
exec (@stmtR

/* Frame 3  procname=adhoc, line=1 */
EXEC sp_addrolemember db_datareader, [v-rcon_approle-svc_****_rw-WOqDPce4L742J1mpoMfM-1639090813

受害者查询

/* Frame 1  procname=adhoc, line=1 */
SELECT **** ...`

这是xdl文件:

<deadlock><victim-list><victimProcess id="process16929ec3088"/></victim-list><process-list><process id="process16929ec3088" taskpriority="0" logused="0" waitresource="METADATA: database_id = 1 PERMISSIONS(class = 100, major_id = 0), lockPartitionId = 11" waittime="2640" ownerId="5731154681" transactionname="Load Permission Object Cache" lasttranstarted="2021-12-10T10:00:13.853" XDES="0x1748f223be8" lockMode="Sch-S" schedulerid="12" kpid="9904" status="suspended" spid="122" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2021-12-10T10:00:13.853" lastbatchcompleted="2021-12-10T09:51:34.830" lastattention="1900-01-01T00:00:00.830" hostname="***" hostpid="15179" loginname="v-rcon_approle-svc_mars_rts_ro-72LuqPkS958rLBVFBUM8-1639083781" isolationlevel="read committed (2)" xactid="5731154673" currentdb="31" currentdbname="*****" lockTimeout="4294967295"><executionStack><frame procname="adhoc" line="1" stmtstart="-1" sqlhandle="0x01001f00804d1e15607b4e1e7f01000000000000000000000000000000000000000000000000000000000000">
SELECT ***  </frame></executionStack><inputbuf>
SELECT ***   </inputbuf></process><process id="process196eff9f468" taskpriority="0" logused="0" waitresource="METADATA: database_id = 1 PERMISSIONS(class = 100, major_id = 0), lockPartitionId = 11" waittime="3047" ownerId="5731154594" transactionname="Load Permission Object Cache" lasttranstarted="2021-12-10T10:00:13.450" XDES="0x174e1e9fbe8" lockMode="Sch-S" schedulerid="12" kpid="14048" status="suspended" spid="118" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2021-12-10T09:55:58.547" lastbatchcompleted="2021-12-10T09:55:58.547" lastattention="1900-01-01T00:00:00.547" clientapp=".Net SqlClient Data Provider" hostname="***" hostpid="4904" loginname="****\****" isolationlevel="read committed (2)" xactid="0" currentdb="1" currentdbname="master" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056"><executionStack/><inputbuf>
****   </inputbuf></process><process id="process18934ab7848" taskpriority="0" logused="1392" waitresource="METADATA: database_id = 31 DATABASE_PRINCIPAL(principal_id = 16390), lockPartitionId = 11" waittime="404" ownerId="5731153668" transactionname="user_transaction" lasttranstarted="2021-12-10T10:00:13.310" XDES="0x181e7460040" lockMode="Sch-M" schedulerid="5" kpid="17184" status="suspended" spid="135" sbid="0" ecid="0" priority="0" trancount="3" lastbatchstarted="2021-12-10T10:00:14.053" lastbatchcompleted="2021-12-10T10:00:14.053" lastattention="1900-01-01T00:00:00.053" clientapp="vault" hostname="****" hostpid="0" loginname="****\_HCVault_SQL_****" isolationlevel="read committed (2)" xactid="5731153668" currentdb="31" currentdbname="*****" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056"><executionStack><frame procname="adhoc" line="1" sqlhandle="0x01001f004dd61113a069b4a77501000000000000000000000000000000000000000000000000000000000000">
alter role [db_datareader] add member [*****]    </frame><frame procname="mssqlsystemresource.sys.sp_addrolemember" line="47" stmtstart="2544" stmtend="2568" sqlhandle="0x0300ff7f9a42b4dd67361d01acad000001000000000000000000000000000000000000000000000000000000">
exec (@stmtR    </frame><frame procname="adhoc" line="1" stmtend="200" sqlhandle="0x01001f006d2168174069b4a77501000000000000000000000000000000000000000000000000000000000000">
EXEC sp_addrolemember db_datareader, [****    </frame></executionStack><inputbuf>
EXEC sp_addrolemember db_datareader, [****]   </inputbuf></process></process-list><resource-list><metadatalock subresource="PERMISSIONS" classid="class = 100, major_id = 0" dbid="1" lockPartition="11" id="lock184944dc100" mode="Sch-M"><owner-list><owner id="process196eff9f468" mode="Sch-S" requestType="wait"/></owner-list><waiter-list><waiter id="process16929ec3088" mode="Sch-S" requestType="wait"/></waiter-list></metadatalock><metadatalock subresource="PERMISSIONS" classid="class = 100, major_id = 0" dbid="1" lockPartition="11" id="lock184944dc100" mode="Sch-M"><owner-list><owner id="process18934ab7848" mode="Sch-M"/></owner-list><waiter-list><waiter id="process196eff9f468" mode="Sch-S" requestType="wait"/></waiter-list></metadatalock><metadatalock subresource="DATABASE_PRINCIPAL" classid="principal_id = 16390" dbid="31" lockPartition="11" id="lock1760380e580" mode="Sch-S"><owner-list><owner id="process16929ec3088" mode="Sch-S"/></owner-list><waiter-list><waiter id="process18934ab7848" mode="Sch-M" requestType="wait"/></waiter-list></metadatalock></resource-list></deadlock>
4

1 回答 1

0

一个有点不令人满意的解决方案:我们将最常用的查询从 ALTER ROLE... 切换到 GRANT 并且死锁停止发生(我们每天看到 20-30 个,所以很容易看到它们已经停止)。

很明显,ALTER ROLE 中的某些行为比简单的授权创建了更高级别的锁定。

于 2021-12-14T23:21:11.090 回答