3

让我先说我是一名经理,我已经有一段时间没有做过这种工作了,你会看到的。但由于各种原因,我不得不覆盖一些 SQL 编程,直到我获得更多员工为止。是的,我会提前说我在这方面是个无能的白痴。

我所拥有的是一个非常非常长的 SQL 语句,其中包含来自各种表的一大堆选择,以及各种子查询。该查询有大约 400 行。它工作得很好,直到我尝试添加某个子查询。该子查询返回错误的值。当我将子查询分解为几个较短的测试查询以进行故障排除时,它们会返回正确的值。组合起来是行不通的。我确定这一定与我加入的方式有关。

我正在尝试获取存储在两张表中的总收入金额:一张表中有当前值,另一张表中有历史值。这些值位于客户级别,并且客户表与其他两个表是一对多的。这两个收入表的结构相同,没有共同的记录。一个是另一个的历史档案。我要做的就是在客户级别上汇总两个表之间的收入值。

这是不起作用的子查询。它应该是 current_revenue 和 history_revenue 的总和:

select c.id1, c.id2,
(select (sum(oe.revenue1)+sum(oe.revenue2)+sum(h.revenue1)+sum(h.revenue2))*.01 
     from order_entry oe, order_history h
     where c.id1 = oe.id1
     and c.id2 = oe.id2
     and c.id1 = h.id1
     and c.id2 = h.id2
     and oe.order_type in ('01','02','03','04')
     and oe.order_status = 'CLOSED'
     and h.order_type in ('01','02','03','04')
     and h.order_status = 'CLOSED') as total_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query incorrectly returns $4460      
--this query is adding the $1500 in twice (see below)

这是两个有效的测试查询。除了表名之外,它们是相同的:

select c.id1, c.id2,
(select (sum(oe.revenue1)+sum(oe.revenue2))*.01
     from order_entry oe
     where c.id1 = oe.id1
     and c.id2 = oe.id2
     and oe.order_type in ('01','02','03','04')
     and oe.order_status = 'CLOSED') as current_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query correctly returns $1460


select c.id1, c.id2,
(select (sum(h.revenue1)+sum(h.revenue2))*.01
     from order_history h
     where c.id1 = h.id1
     and c.id2 = h.id2
     and h.order_type in ('01','02','03','04')
     and h.order_status = 'CLOSED') as historical_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query correctly returns $1500

/*
these will be subqueries in another query which needs to return
total revenue = current_revenue + historical_revenue = 1460 + 1500 = 2960
*/

有人能告诉我为什么组合子查询不起作用吗?再一次,我坦率地坦白承认我的愚蠢。我敢肯定我以后会觉得自己像个白痴,但我只是需要一些帮助。谢谢。

编辑:示例表创建和插入。桌子设计得很差。而且很大。因此样本。还要注意,我正在构建的 SQL 语句的规模很大,因为我在 select 中提取了大约 10MM 记录作为数据馈送,结果证明这比分解和更新要快。就创建最终可以与联合连接的多个表而言,分区没有任何合理性。我尝试了各种方法,但结果是巨大的选择是最快的。正如您所注意到的,我也不是很擅长 SQL 转换,包括优化器提示。

感谢发条缪斯的帮助......我很快就会测试你的解决方案。此外,没有可用的专用报告工具。

create table customer (id1 varchar2(4),id2 varchar2(4), 
first_name varchar2(30),last_name varchar2(30));

insert into customer values ('1234','5678','DAVID','HOOVER');
insert into customer values ('0676','3724','JOHN','BOWER');
insert into customer values ('7281','1766','ANNA','VALENZUELA');
insert into customer values ('1458','0076','MARK','JACKSON');
insert into customer values ('0003','9783','JESSICA','BURNETT');

create table order_entry (id1 varchar2(4),id2 varchar2(4),
order_no number,order_type varchar2(2),order_status varchar2(10), 
revenue1 number(10),revenue2(10));

insert into order_entry values ('1234','5678',238347,'02','CLOSED',1220,0;
insert into order_entry values ('1234','5678',238347,'02','CLOSED',0,240;
insert into order_entry values ('1234','5678',238529,'05','CANCEL',500,700;
insert into order_entry values ('1234','5678',238529,'04','PENDING',871,0;
insert into order_entry values ('0003','9783',198293,'33','CLOSED',870,50;
insert into order_entry values ('0676','3724',219972,'02','CLOSED',375,0;
insert into order_entry values ('0676','3724',219972,'02','PENDING',175,59;
insert into order_entry values ('7281','1766',248221,'04','PENDING',0,999;
insert into order_entry values ('1458','0076',218578,'04','CLOSED',0,99;
insert into order_entry values ('1458','0076',218578,'02','CLOSED',399,0;


create table order_history (id1 varchar2(4),id2 varchar2(4),
order_no number,order_type varchar2(2),order_status varchar2(10), 
revenue1 number(10),revenue2(10));

insert into order_history values ('1234','5678',192832,'01','CLOSED',750,0;
insert into order_history values ('1234','5678',192991,'02','CLOSED',0,750;
insert into order_history values ('0003','9783',138982,'01','CLOSED',299,0;
insert into order_history values ('0676','3724',112729,'01','CLOSED',350,0;
insert into order_history values ('1458','0076',185573,'01','CANCEL',1299,199;
4

2 回答 2

0

首先,您应该明确限定您的连接,而不是使用隐式连接(逗号分隔FROM子句)语法。这本身实际上并不能解决您的问题,但它可能会使未来的工作更容易 - 特别是因为除了“正常”内部连接之外,其他任何事情都会变得更加困难/有点奇怪。

正如@Nikola 所提到的,问题是你得到了“重复”的行。你有两个解决方案:

  1. 向联接添加条件,直到不再有重复的行(请注意,如果表中的唯一信息不匹配,这可能很困难/不可能!)
  2. 在连接之前执行聚合,保证连接的单行。

取决于大量因素,任一选项的性能可能更好或更差。

如果没有有关您的数据的更多信息,就不可能说是否可以添加其他条件来正确“唯一”行(鉴于它可能与order_type列有关,我不确定它是否可能)。所以,这是一个预聚合版本(未经测试):

SELECT c.id1, c.id2, (current_revenue.revenue + historical_revenue.revenue) * .01
FROM Customer c
JOIN (SELECT id1, id2, SUM(revenue1 + revenue2) as revenue
      FROM Order_Entry
      WHERE order_type in ('01', '02', '03', '04')
      AND order_status = 'CLOSED'
      GROUP BY id1, id2) as current_revenue
ON current_revenue.id1 = c.id1
   AND current_revenue.id2 = c.id2
JOIN (SELECT id1, id2, SUM(revenue1 + revenue2) as revenue
      FROM Order_History
      WHERE order_type in ('01', '02', '03', '04')
      AND order_status = 'CLOSED'
      GROUP BY id1, id2) as historical_revenue
ON historical_revenue.id1 = c.id1
   AND historical_revenue.id2 = c.id2
WHERE c.id1 = '1234'
      AND c.id2 = '5678'

请注意,我不确定 Oracle 是否足够聪明以在执行聚合之前应用客户 ID 限制——也就是说,RDBMS 可能会在整个表上执行聚合,而不仅仅是该客户的行。有几种方法可以处理这种可能性;要么将子查询移动到SELECT子句中,要么将客户 ID 选择添加到子选择中。

...此外,400 行非常。您确定以某种方式将其拆分或投资于专用报告工具之类的东西不会更好吗?

于 2013-08-14T00:54:17.287 回答
0

最简单的解决方案:如果您已经计算了 2 个正确值,只需将其相加:

用于测试的 SQLFiddle

select c.id1, c.id2,
  (
    (select (sum(oe.revenue1)+sum(oe.revenue2)) * 0.01  -- current_revenue
         from order_entry oe
         where c.id1 = oe.id1
         and c.id2 = oe.id2
         and oe.order_type in ('01','02','03','04')
         and oe.order_status = 'CLOSED'
    ) 
    +
    (select (sum(h.revenue1)+sum(h.revenue2))*.01    -- historical_revenue
         from order_history h
         where c.id1 = h.id1
         and c.id2 = h.id2
         and h.order_type in ('01','02','03','04')
         and h.order_status = 'CLOSED'
    )
  ) as total_revenue  
from customer c
where c.id1 = '1234'
and c.id2 = '5678'

当然,由于缺乏数据,不可能保证最佳性能,但它只是工作。

于 2013-08-14T06:50:35.333 回答