0

我有一个 cron 作业,每天晚上 3 点从 MySQL 下载数据。我可以测试这个连接并下载它,它可以工作。有时下载会部分失败。(部分下载)如果我尝试重新运行 py 脚本,它会狂吠。键 2 的重复输入错误。

我希望能够运行一个脚本并只删除前一天晚上的条目,这样我就可以重新运行填充数据库的脚本。还有其他三个表与此相关联。如果创建一个删除昨天记录的 SQL 脚本,django 会做什么?它会自动删除对其他表的必要添加,还是我也应该在脚本中这样做?

Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python2.6/site-packages/django/core/management/__init__.py", line 443, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python2.6/site-packages/django/core/management/__init__.py", line 382, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.6/site-packages/django/core/management/base.py", line 196, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/usr/local/lib/python2.6/site-packages/django/core/management/base.py", line 232, in execute
    output = self.handle(*args, **options)
  File "/usr/local/django/grease/greaseboard/management/commands/import_patients.py", line 27, in handle
    mrn = row.MRN,
  File "/usr/local/lib/python2.6/site-packages/django/db/models/manager.py", line 134, in get_or_create
    return self.get_query_set().get_or_create(**kwargs)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/query.py", line 449, in get_or_create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/base.py", line 463, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/base.py", line 551, in save_base
    result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/query.py", line 1576, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/usr/local/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 910, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/site-packages/django/db/backends/util.py", line 40, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 114, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/local/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
django.db.utils.IntegrityError: (1062, "Duplicate entry '000xxxxxxxx' for key 2")
4

1 回答 1

0

不知道工作量有多大,但是对于类似的问题,我在保存点旁边使用了事务 - https://docs.djangoproject.com/en/dev/topics/db/transactions/#savepoints

所以给出类似的东西:

transaction.auto_commit(False)
try:
    for raw_input in streaming_filelikeobject.readline():
        product = do_work(raw_input)
        MyTable(**product).save()
except SomeFileIOError:
    transaction.rollback()
else:
    transaction.commit()

另一个想法是有一个 batch_id 列并在每个批次的开始分配它。

对于非常大的数据集,您可以使用 Memcache/Redis 之类的东西来管理库存。

transaction.auto_commit(False)
try:

   for raw_input in streaming_filelikeobject.readline():
       product = do_work(raw_input)
       if redis_conn.sadd("my_input_set", product['some_unique_id']):
           MyTable(**product).save()

except SomeFileIOError:
    transaction.rollback()
else:
    transaction.commit()

.sadd() 是一个 Redis 命令,如果元素不存在于 redis 集中,则返回 true。

请注意,我是在脑海中输入这些内容,因此方法 django 事务方法签名可能不具有权威性。

于 2013-04-08T20:37:36.617 回答