0

我的程序似乎没有终止......由于我对 python 比较陌生,我怀疑我犯了一个我还没有看到的常见错误。即使最近在Java中,我也通过关闭文件解决了这样的简单问题......

注意: rt_table大约有 250,000 行。在这个 python 程序之前,我已经编写了一个等效的 Java 程序,并且执行时间不长。

def create_AMatrix():
    """Create the adjacency table of the retweet network from rt_table to create an adjacency matrix"""
    con = mdb.connect(host="localhost", user="root", passwd="", db="twitter")    
    cur = con.cursor(mdb.cursors.DictCursor)
    #get vertex set of users in retweet network
    cur.execute("select user_id from users")
    rows = cur.fetchall()
    vSet = list()
    for uID in rows:
        vSet.append(uID)

    #populate adjacency table
    cur.execute("select * from rt_table")
    rows = cur.fetchall()
    for row in rows:
        sourceUserID = row["source_user_id"]
        sourceUserName = row["source_user_name"]
        rtUserID = row["rt_user_id"]
        rtUserName = row["rt_user_name"]
        try:
            curRow = vSet.index(sourceUserID)
            curCol = vSet.index(rtUserID)
        except ValueError:
            continue
        cur.execute("select COUNT(*) from adjacency where r = %s and c = %s", (curRow, curCol))
        if cur.fetchone()['COUNT(*)'] == 0:
            try:
                cur.execute("insert into adjacency (r, c, val, source_user_id, source_user_name, rt_user_id, rt_user_name) values (%d, %d, %d, %d, %s, %d, %s"), (curRow, curCol, 1, sourceUserID, sourceUserName, rtUserID, rtUserName)
                con.commit()
            except:
                con.rollback()
        else:
            try:
                cur.execute("update adjacency set val = val+1 where r = %d and c = %d"), (curRow, curCol)
                con.commit()
            except:
                con.rollback()
    cur.close()
    con.close()
  1. 我的错误在哪里?
  2. 我可以做些什么来找出我的代码在做什么?具体来说,请问程序正在执行哪一行代码?

非常感谢所有帮助,并随时提出建议以使我的代码更加pythonic!

4

1 回答 1

0

One potential problem I can see is this snippet:

try:
    curRow = vSet.index(sourceUserID)
    curCol = vSet.index(rtUserID)
except ValueError:
    continue

The list.index() function searches the list in O(N) time. You're also calling it O(N) times, so your overall efficiency is O(N^2). With N = 250,000, that's a pretty huge inefficiency. I don't see any obvious errors in your code, so I'd guess that the reason it's not returning is because it would take hours to complete, and you're not waiting that long.

One thing you could try is replacing vSet with a dict. From looking at your code, it looks like the only thing you use vSet for is looking up the index of various user IDs, so try replacing this:

vSet = list()
for uID in rows:
    vSet.append(uID)

with this:

vSet = dict()
for index, row in enumerate(rows):
    vSet[row['user_id']] = index

Looking things up in a dict is an O(1) operation, so this should get you to O(N) total runtime.

Also, notice how instead of putting uID into the lookup dict (which would have put in a row), I just put the actual user_id value -- because later on, you're looking up user IDs, not rows. I haven't run your code to test it, but I suspect if it had run to completion, you would have found you had zero output rows, because ints don't compare equal to DB cursor rows and so your code to set curRow and curCol would never have succeeded.

Oh, and of course you'll need to change your curRow and curCol snippet to:

try:
    curRow = vSet[sourceUserID]
    curCol = vSet[rtUserID]
except IndexError:
    continue

Try making those changes, and see if that makes your code work better.

Also, the advice to sprinkle print statements around in your code is a good one. I usually try that first before reaching for a debugger, and most of the time that's enough to clue me in on what the code is doing, and I don't need to pull out the big guns of a debugger. If you do want a Python debugger, though, Google for pdb and read up on how to use it. You can use it from the command line, or integrate it into whatever IDE you're using, depending on how you prefer to work.

于 2013-06-23T01:26:12.200 回答