6

I am playing with the following code from programming collective intelligence, this is a function from the book that calculated eclidian distance between two movie critics.

This function sums the difference of the rankings in the dictionary, but euclidean distance in n dimensions also includes the square root of that sum.

AFAIK since we use the same function to rank everyone it does not matter we square root or not, but i was wondering is there a particular reason for that?


from math import sqrt 
# Returns a distance-based similarity score for person1 and person2 
def sim_distance(prefs,person1,person2): 
  # Get the list of shared_items 
  si={} 
  for item in prefs[person1]: 
    if item in prefs[person2]: 
       si[item]=1 
  # if they have no ratings in common, return 0 
  if len(si)==0: return 0 
  # Add up the squares of all the differences 
  sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) 
                      for item in prefs[person1] if item in prefs[person2]]) 
  return 1/(1+sum_of_squares) 
4

4 回答 4

12

不使用平方根的原因是计算量大;它与 square 函数是单调的(即,它保持顺序),所以如果您只对距离的顺序感兴趣,那么平方根是不必要的(并且如前所述,在计算上非常昂贵)。

于 2009-11-10T17:33:53.730 回答
3

这是正确的。虽然平方根对于数量上正确的结果是必要的,但如果您只关心相对于其他排序的距离,那么取平方根是多余的。

于 2009-11-10T17:32:46.087 回答
2

要计算笛卡尔距离,首先必须计算距离平方,然后取其平方根。但是计算平方根在计算上是昂贵的。如果您真正感兴趣的只是比较距离,那么比较距离平方也同样有效——而且速度快得多。

对于每两个实数 A 和 B,其中 A 和 B >= 0,A 平方和 B 平方与 A 和 B 具有相同的关系总是正确的:

  • 如果 A < B,则 A 平方 < B 平方。
  • 如果 A == B,则 A 平方 == B 平方。
  • 如果 A > B,则 A 平方 > B 平方。

由于距离总是 >= 0,这种关系意味着比较距离的平方可以得到与比较距离相同的答案。

于 2009-11-11T15:21:04.867 回答
1

Just for intercomparisons the square root is not necessary and you would get the squared euclidean distance... which is also a distance (mathematically speaking, see http://en.wikipedia.org/wiki/Metric_%28mathematics%29).

于 2009-11-10T17:32:39.817 回答