sql - 重构使用 row_number() 返回具有唯一列值的行的 tsql 视图

Question

我有一个 sql 视图，我用它来检索数据。可以说它的产品列表很大，这些产品与购买它们的客户相关联。无论链接到多少客户，该视图都应该只返回每个产品的一行。我正在使用 row_number 函数来实现这一点。（这个例子被简化了，一般情况是一个查询，对于某个列 X 的每个唯一值应该只返回一行。返回哪一行并不重要）

CREATE VIEW productView AS
SELECT * FROM 
    (SELECT 
        Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
        customer.Id
        //various other columns
    FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
    ) as temp
WHERE temp.prodcut_numbering = 1

现在假设此视图中的总行数约为 100 万，运行 select * from productView 需要 10 秒。执行 select * from productView where productID = 10 等查询需要相同的时间。我相信这是因为查询被评估为此

SELECT * FROM 
    (SELECT 
        Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
        customer.Id
        //various other columns
    FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
    ) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10

我认为这导致每次都对内部子查询进行全面评估。理想情况下，我想使用以下内容

SELECT 
    Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
    customer.id
    //various other columns
FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
WHERE prodcut_numbering = 1

但这似乎是不允许的。有没有办法做类似的事情？

编辑 -

经过大量实验，我认为我遇到的实际问题是如何强制连接准确返回 1 行。我尝试使用外部应用，如下所示。一些示例代码。

CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
        id int not null PRIMARY KEY,
        productId int not null,
        value varchar(20) NOT NULL)

declare @count int = 1
while @count <= 150000
begin
        insert into Customers (id, productID, value)
        values (@count,@count/2, 'Value ' + cast(@count/2 as varchar))      
        insert into Products (id) 
        values (@count)
        SET @count = @count + 1
end

CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)

使用上面的示例集，下面的“获取所有内容”查询

select * from Products
outer apply (select top 1 * 
            from Customers
            where Products.id = Customers.productID) Customers

运行大约需要 1000 毫秒。添加显式条件：

select * from Products
outer apply (select top 1 * 
            from Customers
            where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'

花费相同的时间。对于一个相当简单的查询来说，这 1000 毫秒已经太多了，并且在添加其他类似的连接时会以错误的方式（向上）扩展。

score 3 · Accepted Answer

尝试以下方法，使用公用表表达式 (CTE)。使用您提供的测试数据，它会在不到一秒的时间内返回特定的 ProductId。

create view ProductTest as 

with cte as (
select 
    row_number() over (partition by p.id order by p.id) as RN, 
    c.*
from 
    Products p
    inner join Customers c
        on  p.id = c.productid
)

select * 
from cte
where RN = 1
go

select * from ProductTest where ProductId = 25

score 2 · Accepted Answer

如果你做了类似的事情怎么办：

SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...

那么 productId 上的过滤器应该会有所帮助。但是，如果没有过滤，情况可能会更糟。

score 1 · Accepted Answer

问题是您的数据模型存在缺陷。你应该有三个表：

客户 (customerId, ...)
产品（productId，...）
ProductSales (customerId, productId)

此外，销售表可能应该拆分为一对多（Sales 和 SalesDetails）。除非你修复你的数据模型，否则你只会在你的尾巴上绕圈子追逐红鲱鱼问题。如果系统不是您的设计，请修复它。如果老板不让你修，那就修吧。如果您无法修复它，请修复它。您提出的不良数据模型没有简单的出路。

score 0 · Accepted Answer

如果您真的不在乎带回哪个客户，这可能会足够快

select p1.*, c1.*
FROM products p1
Left Join (
        select p2.id, max( c2.id) max_customer_id
        From product p2
        Join customer c2 on
        c2.productID = p2.id
        group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;

sql - 重构使用 row_number() 返回具有唯一列值的行的 tsql 视图

4 回答 4

Related

Reference