python - Python 读取 CSV 并将值放入 MySQL 数据库

Question

我正在尝试从 csv 获取值并将它们放入数据库中，我设法做到这一点没有很多麻烦。

但是我知道需要写回 csv，所以下次我运行脚本时，它只会将值从 csv 文件中的标记处输入到数据库中。

请注意，系统上的 CSV 文件每 24 小时自动刷新一次，因此请记住 csv 中可能没有标记。所以如果没有找到标记，基本上将所有值都放入数据库中。

我计划每 30 分钟运行一次此脚本，因此 csv 文件中可能有 48 个标记，甚至每次都删除标记并将其移到文件中？

我一直在删除文件，然后在脚本中重新创建一个文件，以便每个脚本运行新文件，但这会以某种方式破坏系统，所以这不是一个好的选择。

希望各位大神帮忙。。

谢谢你

蟒蛇代码：

import csv
import MySQLdb

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

for row in csv_data:

    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)
#close the connection to the database.
mydb.commit()
cursor.close()
import os


print "Done"

我的 CSV 文件格式：

2013-02-21,21:42:00,-1.0,45.8,27.6,17.3,14.1,22.3,21.1,1,1,2,2
2013-02-21,21:48:00,-1.0,45.8,27.5,17.3,13.9,22.3,20.9,1,1,2,2

score 2 · Accepted Answer

看起来您的 MySQL 表中的第一个字段是唯一的时间戳。可以设置 MySQL 表，使字段必须是唯一的，并忽略INSERT会违反该唯一性属性的 s。在mysql>提示符下输入命令：

ALTER IGNORE TABLE heating ADD UNIQUE heatingidx (thedate, thetime)

（更改thedate和thetime保存日期和时间的列的名称。）

对数据库进行此更改后，您只需在程序中更改一行即可使 MySQL 忽略重复插入：

cursor.execute('INSERT IGNORE INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', row)

INSERT IGNORE ...是的，在已经处理过的线路上运行有点浪费，但考虑到数据的频率（每 6 分钟一次？），就性能而言，这并不重要。

这样做的好处是现在不可能意外地将重复项插入到您的表中。它还使您的程序逻辑简单易读。

它还避免了两个程序同时写入同一个 CSV 文件。即使您的程序通常会成功且没有错误，但每隔一段时间——也许一次在蓝月亮中——您的程序和其他程序可能会尝试同时写入文件，这可能会导致错误或数据损坏。

您还可以通过使用cursor.executemany而不是使您的程序更快一点cursor.execute：

rows = list(csv_data)
cursor.executemany('''INSERT IGNORE INTO `heating` VALUES
    ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)''', rows)

相当于

for row in csv_data:    
    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)

除了它将所有数据打包到一个命令中。

score 1 · Accepted Answer

我认为比“标记”CSV 文件更好的选择是保留一个文件，如果您存储了您处理的最后一行的编号。

因此，如果文件不存在（您存储最后处理的行号的文件），则处理整个 CSV 文件。如果此文件存在，您只处理此行之后的记录。

工作系统的最终代码：

#!/usr/bin/python
import csv
import MySQLdb
import os

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='*******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

start_row = 0

def getSize(fileobject):
fileobject.seek(0,2) # move the cursor to the end of the file
size = fileobject.tell()
return size

file = open('data_csv.log', 'rb')
curr_file_size = getSize(file)

# Get the last file Size
if os.path.exists("file_size"):
with open("file_size") as f:
    saved_file_size = int(f.read())


# Get the last processed line
if os.path.exists("lastline"):
with open("lastline") as f:
    start_row = int(f.read())


if curr_file_size < saved_file_size: start_row = 0

cur_row = 0
for row in csv_data:
 if cur_row >= start_row:
     cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s,    %s, %s, %s, %s, %s, %s, %s ,%s)', row)

     # Other processing if necessary

 cur_row += 1

 mydb.commit()
 cursor.close()


# Store the last processed line
with open("lastline", 'w') as f:
start_line = f.write(str(cur_row + 1)) # you want to start at the **next** line
                                      # next time
# Store Current  File Size To Find File Flush    
with open("file_size", 'w') as f:
start_line = f.write(str(curr_file_size))

# not necessary but good for debug
print (str(cur_row))



 print "Done"

编辑： ZeroG 提交的最终代码，现在正在系统上工作！！也非常感谢 Xion345 的帮助

score 1 · Accepted Answer

每个 csv 行似乎都包含一个时间戳。如果这些总是在增加，您可以在数据库中查询已记录的最大时间戳，并在读取 csv 时跳过该时间之前的所有行。

python - Python 读取 CSV 并将值放入 M​​ySQL 数据库

3 回答 3

Related

Reference

python - Python 读取 CSV 并将值放入 MySQL 数据库