0

我有 5 个结构相同的表。只有PAGEVISITS字段是唯一的

IE。表格1:

ITEM |   PAGEVISITS   |  Commodity
1813       50            Griddle
1851       10            Griddle
11875      100           Refrigerator
2255       25            Refrigerator

IE。表2:

ITEM |   PAGEVISITS   |  Commodity
1813       0             Griddle
1851       10            Griddle
11875      25            Refrigerator
2255       10            Refrigerator

我希望它加起来Commodity吐出:

table1   |   table2   |  Commodity
60           10          Griddle
125          35          Refrigerator

有些数据实际上是正确的,但鉴于以下查询,有些数据还差得远:

SELECT
SUM(MT.PAGEVISITS) as table1,
SUM(CT1.PAGEVISITS) as table2,
SUM(CT2.PAGEVISITS) as table3,
SUM(CT3.PAGEVISITS) as table4,
SUM(CT4.PAGEVISITS) as table5,
(COUNT(DISTINCT MT.ITEM)) + (COUNT(DISTINCT CT1.ITEM)) + (COUNT(DISTINCT CT2.ITEM)) + (COUNT(DISTINCT CT3.ITEM)) + (COUNT(DISTINCT CT4.ITEM)) as Total,
MT.Commodity
    FROM table1 as MT
       LEFT JOIN table2 CT1
       on MT.ITEM = CT1.ITEM
       LEFT JOIN table3 CT2
       on MT.ITEM = CT2.ITEM
       LEFT JOIN table4 CT3
       on MT.ITEM = CT3.ITEM
       LEFT JOIN table5 CT4
       on MT.ITEM = CT4.ITEM
GROUP BY Commodity

我相信这可能是由于使用LEFT JOIN不正确造成的。我也尝试过INNER JOIN同样不一致的结果。

4

1 回答 1

2

我会对所有五个表执行 UNION 以将它们作为一个行集(内联视图),然后对其运行查询,从这样的内容开始......

SELECT SUM(IF(t.source='MT',t.pagevisits,0)) AS table1
     , SUM(IF(t.source='CT1',t.pagevisits,0)) AS table2
     , t.commodity
  FROM ( SELECT 'MT' as source, table1.* FROM table1 
          UNION ALL  
         SELECT 'CT1', table2.* FROM table2
          UNION ALL
         SELECT 'CT2', table3.* FROM table3
          UNION ALL
         SELECT 'CT3', table4.* FROM table4
          UNION ALL
         SELECT 'CT4', table5.* FROM table5
      ) t
GROUP BY t.commodity

(但我会为这些表中的每一个指定列列表,而不是使用“。*”并且让我的查询不依赖于任何人在这些表中添加/删除/重命名/重新排序列。)

I include an "extra" literal value (aliased as "source") to identify which table the row came from. I can use a conditional test in an expression in the SELECT list, to figure out whether the row came from a particular table.

This approach is particularly flexible, and can be used to get more complicated resultsets. For example, if I also wanted to get a total number page visits from table3, 4 and 5 added together, along with the individual counts.

SUM(IF(t.source IN ('CT2','CT3','CT4'),t.pagevisits,0) AS total_345

To get the equivalent of your COUNT(DISTINCT item) + COUNT(DISTINCT item) + ... expression...

I would use an expression that makes a single value from both the "source" and "item" columns, being careful to have some sort of guarantee that any particular "source"+"item" will not create a duplicate of some other "source"+"item". (If we just concatenate strings, for example, we don't have any way to distinguish between 'A'+'11' and 'A1'+'1'.) The most common approach I see here is a carefully chosen delimiter which is guaranteed not to appear in either value. We can distinguish between 'A::11' and 'A1::1', so something like this will work:

 COUNT(DISINCT CONCAT(t.source,'::',t.item))

In your current query, if item is NULL, then the row doesn't get included in the COUNT. To fully replicate that behavior, you would need something like this:

 COUNT(DISINCT IF(t.item IS NOT NULL,CONCAT(t.source,'::',t.item),NULL)) AS Total

Or course, getting a count of distinct item values over the whole set of five tables is much simpler (but then, it does return a different result)

 COUNT(DISINCT t.item)

But to answer your question about the use of the LEFT JOIN, the left side table is the "driver" so a matching row has to be in that table for a corresponding row to be retrieved from a table on the right. That is, unmatched rows from the tables on the right side will not be returned.

If what you have is basically five "partitions", and you want to process all of the rows whether or not a matching row appears in any of the other "partitions", I would go with the UNION ALL approach to simply concatenate all of the rows from all of those tables together, and process the rows as if they were from a single table.

NOTE: For very large tables, this may not be a feasible approach, since MySQL is going to have to materialize that inline view. There are other approaches which don't require concatenating all of the rows together.

Specifying a list of only the columns you need, in the SELECT from each table, may help performance, if there are columns in those tables you don't need to reference in your query.


于 2012-07-17T15:54:50.180 回答