algorithm - 如何使排序与排名稳定？

Question

排序列表中的项目会根据列表中是否有其他项目更改相对顺序。排序算法类似于金融扫描仪对公司进行排序的方式。

下表列出了公司的品牌，品牌有多好，有多牛，风险和盈利，等等。

|symbol|risk|brand|quality|cheapness|profit|
|------|----|-----|-------|---------|------|
|BHP   |0   |1    |1      |0.11     |0.1   |
|MAIL  |1   |1    |-1     |0.06     |0.18  |

我们希望对表格进行排序，以便最好的投资公司排在首位。

为此，我们需要使用其参数的某种加权组合来比较公司。但是有一个问题 - 参数的范围非常不同，有些在[0.01..0.3]其他范围内[1..10]。

为了标准化范围，我们将使用排名排序。分别对每一列进行排序并从枚举其值1 to N。所以表格将变成：

|symbol|risk|brand|quality|cheapness|profit|
|------|----|-----|-------|---------|------|
|BHP   |1   |1    |2      |2        |1     |
|MAIL  |2   |1    |1      |1        |2     |

现在值在相同的范围内，我们可以使用加权组合进行比较，下表是权重列表，可将其与公司属性相乘并计算每个公司的分数- 单个数字：

weights =
       |risk|brand|quality|cheapness|profit|
       |-1  |1    |1      |1        |1     |

在将属性乘以其权重并将其相加后，我们可以计算出每家公司的得分。然后按分数列对表格进行排序，使分数最高的公司位于顶部（见最后一列）：

|symbol|risk|brand|quality|cheapness|profit|score|
|------|----|-----|-------|---------|------|-----|
|BHP   |1   |1    |2      |2        |1     |5    |
|MAIL  |2   |1    |1      |1        |2     |3    |

到目前为止，它运行良好，我们可以看到BHP 是一家比 MAIL 更好的公司。

问题

让我们添加更多公司：

|symbol|risk|brand|quality|cheapness|profit|score|
|------|----|-----|-------|---------|------|-----|
|MAIL  |1   |1    |-1     |0.06     |0.18  |9    |
|MAR   |0   |1    |1      |0.03     |0.17  |9    |
|BHP   |0   |1    |1      |0.11     |0.1   |8    |
|IHG   |0   |1    |1      |0        |0.15  |6    |
|TRI   |0   |1    |1      |0.02     |0.11  |6    |

这是怎么回事？BHP 和 MAIL的顺序已经相反了！那感觉不对我们的投资决定将取决于我们添加到列表中的公司数量，感觉这不是一种可靠的赚钱方式。

更新，按照我计算第二张表的排名的建议。似乎该算法确实是不稳定的，并且很容易受到其他项目的影响，原始值的微小差异可以变成其等级的非常大的差异。正如我们现在可以看到的利润5/1 = 5与2/1 = 2表中只有 2 个项目时的差异。

|symbol|risk|brand|quality|cheapness|profit|score|
|------|----|-----|-------|---------|------|-----|
|MAIL  |2   |1    |1      |4        |5     |9    | <= profit = 5
|MAR   |1   |1    |2      |3        |4     |9    | 
|BHP   |1   |1    |2      |5        |1     |8    | <= profit = 1
|IHG   |1   |1    |2      |1        |3     |6    | 
|TRI   |1   |1    |2      |2        |2     |6    |

问题：

似乎排名排序算法不稳定并且容易受到表中其他项目的影响，可能会给出非常不同的排序结果。

有没有办法让它稳定？因此，A 和 B 的相对顺序将始终相同，不取决于列表中的其他项目。

对该列表进行排序的更好选择是什么？列归一化并使用归一化值的加权和来计算分数？（我放弃了它，因为我相信我找到了圣杯 - 排名排序 - 现在可以忽略标准化）。

代码

const companies2 = [
  { symbol: 'MAIL', risk: 1, brand: 1, quality: -1, cheapness: 0.06,   profit: 0.18 },
  { symbol: 'BHP',  risk: 0, brand: 1, quality: 1,  cheapness: 0.11,   profit: 0.1  }
]

const companies5 = [
  { symbol: 'MAIL', risk: 1, brand: 1, quality: -1, cheapness: 0.06,   profit: 0.18 },
  { symbol: 'MAR',  risk: 0, brand: 1, quality: 1,  cheapness: 0.03,   profit: 0.17 },
  { symbol: 'BHP',  risk: 0, brand: 1, quality: 1,  cheapness: 0.11,   profit: 0.1  },
  { symbol: 'IHG',  risk: 0, brand: 1, quality: 1,  cheapness: 0,      profit: 0.15 },
  { symbol: 'TRI',  risk: 0, brand: 1, quality: 1,  cheapness: 0.02,   profit: 0.11 },
]

const weights = { risk: -1, brand: 1, quality: 1, cheapness: 1, profit: 1 }

console.log("BHP better than MAIL: \n")
console.log(sort_with_rank(companies2, weights).map(({ symbol }) => symbol))
console.log("\n\n")

console.log("MAIL better than BHP:\n")
console.log(sort_with_rank(companies5, weights).map(({ symbol }) => symbol))


// Helper functions ----------------------------------------------------------------------


function sort_with_rank(table, weights) {
  // Adding ranks
  let containers = table.map((row) => ({ row, ranks: {} }))
  let sort_key
  for (sort_key in weights) {
    containers = map_with_rank(
      containers,
      ({ row }) => row[sort_key],
      (item, rank) => {
        item.ranks[sort_key] = rank
        return item
      }
    )
  }

  // Calculating score
  const containers_with_score = containers.map(({ row, ranks }) => {
    let sort_key, score = 0
    for (sort_key in weights) score += weights[sort_key] * ranks[sort_key]
    return { row, ranks, score }
  })

  // Sorting by score
  const sorted = sort_by(containers_with_score, ({ score }) => -score)

  // Mapping to original
  return sorted.map(({ row, score }) => ({ ...row, score }))
}

function map_with_rank(list, order_by, map) {
  // Sorting accourding to rank
  const list_with_index = list.map((v, i) => ({ v, original_i: i, order_by: order_by(v) }))
  const sorted = sort_by(list_with_index, ({ order_by }) => order_by)

  // Adding rank, if values returned by `order_by` are the same, the rank also the same
  const sorted_with_rank = []
  let rank = 1
  for (let i = 0; i < sorted.length; i++) {
    const current = sorted[i]
    if (i > 0 && current.order_by != sorted[i - 1].order_by) rank++
    sorted_with_rank.push({ ...current, rank })
  }

  // Restoring original order and mapping
  const original_with_rank = sort_by(sorted_with_rank, ({ original_i }) => original_i)
  return original_with_rank.map(({ v, rank }) => map(v, rank))
}

function sort_by(list, by) {
  list = [...list]
  list.sort(function(a, b) { return by(a) - by(b) })
  return list
}

score 1 · Accepted Answer

我们可以使用softmax 函数，它在某种程度上代表了我们可以用作排名的概率。

请记住：

softmax(yi, y) = exp(yi)/sum(exp(y))

对于每个 dim aka 便宜、利润（i 以上表示特征索引，yi 是与特征 i 的 dim y 相关联的标量）等，我们有可能成为第一。

最后，我们怀疑地宣布，总体上第一名的几率是每个昏暗的第一名的结合。

const companies2 = [{"symbol":"MAIL","risk":-1,"brand":1,"quality":-1,"cheapness":0.06,"profit":0.18},{"symbol":"BHP","risk":0,"brand":1,"quality":1,"cheapness":0.11,"profit":0.1}]
const companies5 = [{"symbol":"MAIL","risk":-1,"brand":1,"quality":-1,"cheapness":0.06,"profit":0.18},{"symbol":"MAR","risk":0,"brand":1,"quality":1,"cheapness":0.03,"profit":0.17},{"symbol":"BHP","risk":0,"brand":1,"quality":1,"cheapness":0.11,"profit":0.1},{"symbol":"IHG","risk":0,"brand":1,"quality":1,"cheapness":0,"profit":0.15},{"symbol":"TRI","risk":0,"brand":1,"quality":1,"cheapness":0.02,"profit":0.11}]
function sort (features) {
  const fields = ['risk', 'brand', 'quality', 'cheapness', 'profit']
  const denoms = fields.map(field => {
    return features.reduce((acc, feat) => acc + Math.exp(feat[field]), 0)
  })
  return features.map(feat => {
    const score = fields.reduce((acc, f, i) => {
      const softmaxed = Math.exp(feat[f]) / denoms[i]
      return acc * softmaxed
    }, 1)
    return { score, feat }
  }).sort((a, b) => b.score - a.score)
}
console.log(sort(companies2).map(({ score, feat }) => ({ score, ...feat})))
console.log(sort(companies5).map(({ score, feat }) => ({ score, ...feat})))

a请注意，尽管考虑具有两个特征（和b）的两个维度（0 和 1），但它也不稳定

这样

a_0/T_0 < b_0/T_0
a_1/T_1 > b_1/T_1
a < b

然后突然之间，新特征转变T为T = [T_0+9000, T_1+0.1]

dim 0变得无关紧要，dim 1变得更加重要。

从b_1 < a_1，然后（在和a > b之间切换顺序）ab

因此，我们保留了每个维度的顺序，但整个行可能会“颠倒”。

algorithm - 如何使排序与排名稳定？

问题

问题：

代码

1 回答 1

Related

Reference