2

有许多酒店的床位容量不同。我需要了解任何一天,每家酒店占用了多少张床位。

样本数据:

 HOTEL      CHECK-IN     CHECK-OUT
   A       29.05.2010   30.05.2010
   A       28.05.2010   30.05.2010
   A       27.05.2010   29.05.2010
   B       18.08.2010   19.08.2010
   B       16.08.2010   20.08.2010
   B       15.08.2010   17.08.2010

中间结果:

HOTEL      DAY          OCCUPIED_BEDS
  A     27.05.2010           1      
  A     28.05.2010           2
  A     29.05.2010           3
  A     30.05.2010           2
  B     15.08.2010           1
  B     16.08.2010           2
  B     17.08.2010           2
  B     18.08.2010           2
  B     19.08.2010           2
  B     20.08.2010           1

最后结果:

 HOTEL     MAX_OCCUPATION  
   A            3
   B            2

之前有人问过类似的问题。我想获取两个日期之间的日期列表(如 Tom Kyte 所示group by)并使用. 问题是我的桌子比较大,我想知道是否有更便宜的方法来完成这项任务。

4

3 回答 3

2

I don't think there's a better approach than the one you outlined in the question. Create your days table (or generate one on the fly). I personally like to have one lying around, updated once a year.

Someone who understand analytic functions will probably be able to do this without an inner/outer query, but as the inner grouping is a subset of the outer, it doesn't make much difference.

Select
  i.Hotel,
  Max(i.OccupiedBeds)
From (
  Select
    s.Hotel,
    d.DayID,
    Count(*) As OccupiedBeds
  From
    SampleData s
      Inner Join
    Days d
      -- might not need to +1 depending on business rules.
      -- I wouldn't count occupancy on the day I check out, if so get rid of it
      On d.DayID >= s.CheckIn And d.DayID < s.CheckOut + 1 
  Group By
    s.Hotel, 
    d.DayID
  ) i
Group By
  i.Hotel

After a bit of playing I couldn't get an analytic function version to work without an inner query:

If speed really is a problem with this, you could consider maintaining an intermediate table with triggers on main table.

http://sqlfiddle.com/#!4/e58e7/24

于 2012-11-16T20:37:59.807 回答
2

创建一个包含您感兴趣的日期的临时表

create table #dates (dat datetime)
insert into #dates (dat) values ('20121116')
insert into #dates (dat) values ('20121115')
insert into #dates (dat) values ('20121114')
insert into #dates (dat) values ('20121113')

通过加入带有日期的预订来获得中间结果,以便每个预订日“生成”一个

SELECT Hotel, d.dat, COUNT(*) from bookings b
INNER JOIN #dates d on d.dat BETWEEN b.checkin AND b.checkout
GROUP BY Hotel, d.dat 

终于拿到Max了

SELECT Hotel, Max(OCCUPIED_BEDS) FROM IntermediateResult GROUP BY Hotel
于 2012-11-16T20:30:40.203 回答
1

性能问题在于连接条件不是基于相等性,这使得哈希连接成为不可能。假设我们有一张hotel_day 表,里面有酒店天对,我会尝试这样的事情:

select ch_in.hotel, ch_in.day,
       (check_in_cnt - check_out_cnt) as occupancy_change
from   ( select d.hotel, d.day, count(s.hotel) as check_in_cnt
         from   hotel_days d,
                sample_data s
         where  s.hotel(+) = d.hotel
           and  s.check_in(+) = d.day
         group  by d.hotel, d.day
       ) ch_in,
       ( select d.hotel, d.day, count(s.hotel) as check_out_cnt
         from   hotel_days d,
                sample_data s
         where  s.hotel(+) = d.hotel
           and  s.check_out(+) = d.day
         group  by d.hotel, d.day
       ) ch_out
where  ch_out.hotel = ch_in.hotel
  and  ch_out.day = ch_in.day

权衡是双重全扫描,但我认为它仍然会运行得更快,并且可以并行化。(我假设 sample_data 很大,主要是由于预订数量,而不是酒店本身的数量。)输出是特定日期特定酒店的入住率变化,但这可以很容易地总结为总值,无论是分析函数或(可能更有效)带有批量收集的 PL/SQL 过程。

于 2012-11-17T14:06:39.063 回答