0

我是 R 的新手,我可能很难问我的问题。请多多包涵。

我有两个数据框。为了解释,我们假装:

df1

列代表收益类型:玉米、燕麦、小麦等。行代表一年中的月份、一月、二月等。元素代表在该特定月份购买的该收益类型的每吨价格。

df2

代表国家的列:西班牙、智利、墨西哥等。此框架的行代表与该国家打交道的额外成本,可能是:每个国家的包装成本、运输成本、国家进口税、检验费等。

现在我想构建第三个数据框:

df3

它表示所有国家/地区每月谷物组合的总成本(例如 10% 玉米、50% 燕麦……)以及相关的运输、税收等成本 假设有一个等式(使用来自 df1 和 df2 的数据)计算给定谷物组合每个国家每月的总成本以及每个国家的额外成本。

为简洁起见,让我们假设 3 月份总成本等式的一部分,而西班牙是

cost <- .10 * df1[ “mar”,”oats”]  + df2[“tax”,”Spain”]  + .....

选择第二个数据帧的元素并对第一个数据帧的列进行算术运算以获得结果对我来说是直截了当的。对于特定国家:

cost <- .10 * df1[ ,”oats”]  + df2[“tax”,”Spain”]  + .....

这给了我西班牙每个月的费用

问题是:我必须对每个国家重复相同的算术。

另一个版本:

  cost <- .10 * df1[ ,”oats”]  + df2[“tax”,]  + .....

给我每个国家/地区的费用,但仅适用于 1 月份

我想要一组方程式,它可以为我提供所有县每月的总成本。换句话说,df3行数与df1(months) 相同,列数与df2(countries) 相同。

编辑...粘贴在已关闭问题中发布的示例中:

# build df1 - cost of grains (with goofy data so I can track the arithemetic)
  v1 <- c(1:12)
  v2 <- c(13:24)
  v3 <- c(25:36)
  v4 <- c(37:48)
  grain <- data.frame("wheat"=v1,"oats"=v2,"corn"=v3,"rye"=v4)

  grain

# build df2 - additional costs (again, with goofy data to see what is being used where and when)
  w1 <- c(1.3:4.3)
  w2 <- c(5.3:8.3)
  w3 <- c(9.3:12.3)
  w4 <- c(13.3:16.3)
  cost <- data.frame("Spain"=w1,"Peru"=w2,"Mexico"=w3,"Kenya"=w4)
  row.names(cost) <- c("packing","shipping","tax","inspection")

  cost

# assume 10% wheat, 30% oats and 60% rye with some clown-equation for total cost
# now for my feeble attempt at getting a dataframe that has 12 rows (months) and 4 column (countries)

  total_cost <- data.frame( 0.1*grain[,"wheat"] +
                            0.3*grain[,"oats"] +
                            0.6*grain[,"rye"] +
                            cost["packing","Mexico"] +
                            cost["shipping","Mexico"] +
                            cost["tax","Mexico"]  +
                            cost["inspection","Mexico"] )
  total_cost
4

1 回答 1

1

You have a couple of choices: one would be to use the outer function supplying inputs of the 'month' vector and the 'country' vector from the colnames of df2 and using a function that would pull the 'cost' components from df1 and df2. (Could not get that approach to work.) You would get a 'month' x 'country' matrix. Another would be to transpose the df2 dataframe and merge using all=TRUE with df1 getting a "long" format dataframe from which you could do column operations with your formulas, and then reshape to a format that is "wide" in 'countries'. Details will depend on the specific data setup and you have not offered an example yet.

This will give you a 12 x 4 grid of combinations of months and countries:

 dfrm <- expand.grid(grain$months,  colnames(cost) )

This will give you a function that takes a month value and a country value and calculates the expression above:

 costcros <- function(x) { sum(grain[ grain[, 'months'] == x[1], c(1,2,4)]*c(0.1,0.3,0.6) ) + 
                           sum( cost[, x[2]]) }

This adds the calculation to each row of dfrm:

 dfrm$crosscost <- apply(expand.grid(grain$months,  colnames(cost) ), 1,  costcros)
于 2012-09-10T15:59:55.247 回答