0

我有一个名为passivethan 的表,其中包含每个用户的时间戳事件列表。我想填充属性duration,它对应于当前行的事件和该用户完成的下一个事件之间的时间。

我尝试了以下查询:

UPDATE passive as passive1
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) )
    FROM passive as passive2
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
);

这将返回错误消息Error 1093 - You can't specify target table for update in FROM

为了规避这个限制,我尝试遵循https://stackoverflow.com/a/45498/395857中给出的结构,它使用 FROM 子句中的嵌套子查询来创建隐式临时表,因此它不会t 算作我们正在更新的同一张表:

UPDATE passive 
SET passive.duration = (

    SELECT *
    FROM (SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive.event_time)) 
        FROM passive, passive as passive2
        WHERE passive.user_id = passive2.user_id 
        AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
        )
    AS X
);

但是,passive嵌套子查询中的表passive与主查询中的表不同。因此,所有行都具有相同的passive.duration值。如何passive在嵌套子查询中引用主查询?(或者也许有一些替代方法来构建这样的查询?)

4

2 回答 2

2

试试这样......

UPDATE passive as passive1
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) )
    FROM (SELECT * from passive) Passive2
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
    )
;
于 2013-06-04T04:24:13.303 回答
0

我们可以使用 Python 脚本来规避这个问题:

'''
We need an index on user_id, timestamp to speed up
'''

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Download it at http://sourceforge.net/projects/mysql-python/?source=dlp
# Tutorials: http://mysql-python.sourceforge.net/MySQLdb.html
#            http://zetcode.com/db/mysqlpython/
import MySQLdb as mdb 

import datetime, random

def main():
    start = datetime.datetime.now()

    db=MySQLdb.connect(user="root",passwd="password",db="db_name")
    db2=MySQLdb.connect(user="root",passwd="password",db="db_name")

    cursor = db.cursor()
    cursor2 = db2.cursor()

    cursor.execute("SELECT observed_event_id, user_id, observed_event_timestamp FROM observed_events ORDER BY observed_event_timestamp ASC")

    count = 0
    for row in cursor:
        count += 1
        timestamp = row[2]
        user_id = row[1]
        primary_key = row[0]
        sql = 'SELECT observed_event_timestamp FROM observed_events WHERE observed_event_timestamp > "%s" AND user_id = "%s" ORDER BY observed_event_timestamp ASC LIMIT 1' % (timestamp, user_id)
        cursor2.execute(sql)
        duration = 0
        for row2 in cursor2:
            duration = (row2[0] - timestamp).total_seconds()
            if (duration > (60*60)):
                duration = 0
                break

        cursor2.execute("UPDATE observed_events SET observed_event_duration=%s WHERE observed_event_id = %s" % (duration, primary_key))

        if count % 1000 == 0:
            db2.commit()
            print "Percent done: " + str(float(count) / cursor.rowcount * 100) + "%" + " in " + str((datetime.datetime.now() - start).total_seconds()) + " seconds."

    db.close()
    db2.close()
    diff = (datetime.datetime.now() - start).total_seconds()
    print 'finished in %s seconds' % diff

if __name__ == "__main__":
    main()
于 2013-06-16T17:15:07.733 回答