0

我想将年增长率添加到如下创建的年度行业销售数据表(基本字段):

CREATE  TABLE IF NOT EXISTS MarketSizes (
  marketSizeID INT PRIMARY KEY AUTO_INCREMENT ,
  industry INT NOT NULL,
  year INT NOT NULL,
  countryID INT NOT NULL REFERENCES Countries (countryID),
  annualSales DEC(20,2) NULL,
  growthRate DEC(5,2) NULL) 

给定大约 25 年、100 多个国家和 5000 多个行业的年度数据,填充/更新growthRate 列的最有效方法是什么?是最有效的索引方式(行业、年份、国家 ID)吗?谢谢你的时间!

4

2 回答 2

1

考虑将growthRate 放在一个视图中:

CREATE VIEW growthRate AS
SELECT
m1.*,
(m1.annualSales - m2.annualSales) / m2.annualSales AS growthRate
FROM
MarketSizes m1
LEFT JOIN MarketSizes m2 ON m1.industry = m2.industry 
                         AND m1.countryID = m2.countryID 
                         AND m2.year = m1.year - 1

在 (industry, countryID) 和 year 上创建一个索引,它应该有足够的性能。

于 2013-06-27T08:44:36.653 回答
1

免责声明:这是未经测试的,源于好奇和一些玩弄。如果您想使用它而不是走“更安全”的路线,请自行判断。欢迎评论,如果有人想多玩一点,这是我使用的sqlfiddle 。其余的都出乎意料了,但是已经很晚了,所以请不要对任何错误投反对票。

好吧,出于好奇,我找到了一种(hacky)方法来加快我认为的更新速度。除了这个小测试之外,我还没有测试过它:

    create table foo(id int, newid int);
    insert into foo (id) values (1), (2), (3);

    update foo, (select @prev:=0) vars
    set foo.newid = @prev,
    foo.id = if(@prev := id, id, id);

    select * from foo

    | ID | NEWID |
    --------------
    |  1 |     0 |
    |  2 |     1 |
    |  3 |     2 |

但是我在选择语句方面取得了很好的经验,您希望获得前一行的信息。通过使用用户变量,您不必使用自联接表(在选择中)。由于您无法同时更新正在读取的表,因此需要一个虚拟表。仅提及我开发此答案的一些原因。所以这里是:

您的更新声明将是

SET @prev = 1; /*this is the value the row should have which has no previous year (or if countryID or industry changed)*/
SET @prevCountry = (SELECT countryID FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);
SET @prevIndustry = (SELECT industry FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);

/*also it's important to initialize the variable before-hand, not on the fly like in the example above. Otherwise MySQL complains about a syntax error, because it doesn't support an ORDER BY clause in a multi-table update statement. ORDER BY will be important in the statement!*/

UPDATE MarketSizes
SET growthRate = (annualSales - @prev) / @prev, /*here @prev holds the value of the previous row*/

/*and here come's your "where" clause. If country or industry change reset previousYear value to 1*/
marketSizeID = IF(@prevCountry != countryID OR @prevIndustry != industry, IF(@prev := 1, marketSizeID, marketSizeID), IF(@prev := 1, marketSizeID, marketSizeID)), /*why the convoluted IF()s? see explanation below, things got a bit messed up*/
marketSizeID = IF(@prev := annualSales, marketSizeID , marketSizeID), /*here the value of the current row gets assigned to @prev*/

/*Why the update on marketSizeID? And the IF(this,then,else)? That's the trick. Every other way to assign a new value to our variable @prev results in a syntax error. I just chose the primary key, because it's there. Actually it doesn't matter which column is used here and it might be another performance boost to choose a column which has no index on it (primary key has of course).*/

marketSizeID = IF(@prevCountry := countryID, marketSizeID, marketSizeID),
marketSizeID = IF(@prevIndustry := industry, marketSizeID, marketSizeID)

ORDER BY `year`, countryID, industry, marketSizeID;
于 2013-06-27T00:21:38.957 回答