python - Programming Collective Intelligence 中的 Pearson 算法仍然无法正常工作

Question

我运行代码来计算皮尔逊相关系数，并且函数（粘贴在下面）顽固地返回 0。

根据之前关于这个问题的建议（参见下面的#1、#2），我确实确保该函数能够执行浮点计算，但这并没有帮助。我会很感激这方面的一些指导。

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1]:
          if item in prefs[p2]: si[item]=1


          # Find the number of elements
          n=float(len(si))


          # if they are no ratings in common, return 0
          if n==0: return 0


          # Add up all the preferences
          sum1=float(sum([prefs[p1][it] for it in si]))
          sum2=float(sum([prefs[p2][it] for it in si]))

          # Sum up the squares
          sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
          sum2Sq=sum([pow(prefs[p2][it],2) for it in si])

          # Sum up the products
          pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])


          # Calculate Pearson score
          num=pSum-(1.0*sum1*sum2/n)
          den=sqrt((sum1Sq-1.0*pow(sum1,2)/n)*(sum2Sq-1.0*pow(sum2,2)/n))
          if den==0: return 0

          r=num/den

          return r

我的数据集：

 # A dictionary of movie critics and their ratings of a small
 # set of movies

 critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
       'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
       'The Night Listener': 3.0},
     'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
      'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 3.5},
     'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
      'Superman Returns': 3.5, 'The Night Listener': 4.0},
     'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
      'The Night Listener': 4.5, 'Superman Returns': 4.0,
      'You, Me and Dupree': 2.5},
     'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 2.0},
     'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
     'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

其他类似问题：

链接 #1： “编程集体智能”中的这个 python 函数有什么问题？
链接#2：《编程集体智能》中的皮尔逊算法有什么问题？

score 2 · Accepted Answer

感谢大家在评论中的帮助，我已经确定了问题所在。只是在开玩笑。有很多问题。最后，我注意到 for 循环没有加倍（在第 6 行），它需要加倍。结束前的一个阶段，我疯狂地包围了一切float，对不起。无论如何，你想要花车。在此之前，有一个事实是他没有keys()为评论家引用，这是他需要的。此外，皮尔逊系数计算不正确，以至于需要数学家来修复它（我有数学学士学位）。现在，他为 Gene Seymour 和 Lisa Rose 测试的示例可以正常工作。无论如何，将其另存为pearson.py，或其他内容：

from __future__ import division
from math import sqrt

def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
   si={}
   for item in prefs[p1].keys():
      for item in prefs[p2].keys():
         if item in prefs[p2].keys():
            si[item]=1


      # Find the number of elements
      n=float(len(si))


      # if they are no ratings in common, return 0
      if n==0:
         print 'n=0'
         return 0


      # Add up all the preferences
      sum1=float(sum([prefs[p1][it] for it in si.keys()]))
      sum2=float(sum([prefs[p2][it] for it in si.keys()]))
      print 'sum1=', sum1, 'sum2=', sum2
      # Sum up the squares
      sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
      sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
      print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq
      # Sum up the products
      pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))


      # Calculate Pearson score
      num=(pSum/n)-(1.0*sum1*sum2/pow(n,2))
      den=sqrt(((sum1Sq/n)-float(pow(sum1,2))/float(pow(n,2)))*((sum2Sq/n)-float(pow(sum2,2))/float(pow(n,2))))
      if den==0:
         print 'den=0'
         return 0

      r=num/den

      return r

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
   'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
   'The Night Listener': 3.0},
 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
  'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
  'You, Me and Dupree': 3.5},
 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
  'Superman Returns': 3.5, 'The Night Listener': 4.0},
 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
  'The Night Listener': 4.5, 'Superman Returns': 4.0,
                                                                                                                                                             1,1

然后，键入：

import pearson
pearson.sim_pearson(pearson.critics, pearson.critics.keys()[1], pearson.critics.keys()[2])

或者简单地说：

import pearson
pearson.sim_pearson(pearson.critics, 'Lisa Rose', 'Gene Seymour')

如果您在使用它时遇到任何问题，请告诉我。我留下了print我用来解决它的语句，这样你就可以看到我是如何解决它的，但显然不需要它们。

如果您在本书中遇到任何问题并且无法解决，请在 SO 的帮助下给我发电子邮件：raphael[at]postacle.com，我应该可以回复您。我前段时间也下载了，有点懒；）

score 0 · Accepted Answer

@rofls 是对的——

For 循环：我的 for 循环是个问题。Floats：一些术语需要从int类型转换为float类型。
键：这是我第一次肯定错过的东西，@rofls 被抓住了。
皮尔逊系数：我收紧了皮尔逊系数的分母部分，使其与维基百科页面上的表达式一致；它是数学属性部分下的最后一个表达式。

该代码现在有效。我尝试了不同的输入组合。

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1].keys():
           if item in prefs[p2].keys():
               print 'item=', item
               si[item]=1

       # Find the number of elements
       n=float(len(si))
       print 'n=', n

       # if they are no ratings in common, return 0
       if n==0:
           print 'n=0'
           return 0

       # Add up all the preferences
       sum1=float(sum([prefs[p1][it] for it in si.keys()]))
       sum2=float(sum([prefs[p2][it] for it in si.keys()]))
       print 'sum1=', sum1, 'sum2=', sum2

       # Sum up the squares
       sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
       sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
       print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq

       # Sum up the products
       pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))
       print 'pSum=', pSum

       # Calculate Pearson score
       num=(n*pSum)-(1.0*sum1*sum2)
       print 'num=', num
       den1=sqrt((n*sum1Sq)-float(pow(sum1,2)))
       print 'den1=', den1
       den2=sqrt((n*sum2Sq)-float(pow(sum2,2)))
       print 'den2=', den2
       den=1.0*den1*den2

      if den==0:
           print 'den=0'
           return 0

       r=num/den
       return r

    critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
          'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
          'The Night Listener': 3.0},
         'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
          'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 3.5},
         'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
          'Superman Returns': 3.5, 'The Night Listener': 4.0},
         'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
          'The Night Listener': 4.5, 'Superman Returns': 4.0,
          'You, Me and Dupree': 2.5},
         'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 2.0},
         'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
         'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

    # Done

python - Programming Collective Intelligence 中的 Pearson 算法仍然无法正常工作

2 回答 2

Related

Reference