3

我有一个表用户(它有数百万行)

  Id         Name         Country          Product   
+----+---------------+---------------+--------------+
  1          John          Canada             
  2          Kate          Argentina                  
  3          Mark          China
  4          Max           Canada
  5          Sam           Argentina  
  6          Stacy         China           
  ...
  1000       Ken           Canada 

我想用 或基于 百分比填充该Product列。ABC

我有另一个名为 CountryStats 的表,如下所示

  Id        Country         A             B            C
+-----+---------------+--------------+-------------+----------+
  1          Canada          60            20           20
  2          Argentina       35            45           20
  3          China           40            10           50

该表包含每种产品的用户百分比。例如在加拿大,60% 的人拥有产品 A,20% 的人拥有产品 B,20% 的人拥有产品 C。

我想用基于第二个数据中百分比的数据填充用户表。因此,例如,如果加拿大有 100 万用户,我想用200000和 200000填充Product用户表中列的600000ABC

感谢您提供有关如何执行此操作的任何帮助。我不介意在多个步骤中执行它我只是需要有关如何在 SQL 中实现它的提示

4

1 回答 1

2

The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.

The following SQL accomplishes this:

with toupdate as (
      select u.*,
             row_number() over (partition by country order by newid()) as seqnum,
             count(*) over (partition by country) as tot
      from users u
     )
update u
    set product = (case when seqnum <= tot * A / 100 then 'A'
                        when seqnum <= tot * (A + B) / 100 then 'B'
                        else 'C'
                   end)
    from toupdate u join
         CountriesStats cs
         on u.country = cs.country;

The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.

The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.

Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.

于 2013-11-03T18:38:22.090 回答