我有一个脚本,可以解析 csv 文件中的信息并执行 SQL 语句来创建表并插入数据。我必须解析一个 ~25 GB 的 csv 文件,但是根据我之前解析的文件大小,我估计这可能需要长达 20 天的时间。关于如何优化我的脚本以使其运行更快的任何建议?我省略了 createtable 函数,因为它只被调用一次。InsertRow() 是我认为我真正需要加快速度的函数。提前致谢。
#Builds sql insert statements and executes sqlite3 calls to insert the rows
def insertRow(cols):
first = True; #First value for INSERT arguments doesn't need comma front of it.
conn = sqlite3.connect('parsed_csv.sqlite')
c = conn.cursor()
print cols
insert = "INSERT INTO test9 VALUES("
for col in cols:
col = col.replace("'", "")
if(first):
insert += "'" + col + "'"
first = False;
else:
insert += "," + "'" + col+ "'" + " "
insert += ")"
print (insert)
c.execute(insert)
conn.commit()
def main():
#Get rid of first argument (filename)
cmdargs = sys.argv[1:]
#Convert values to integers
cmdargs = list(map(int, cmdargs))
#Get headers
with open(r'requests_fields.csv','rb') as source:
rdr = csv.reader(source)
for row in rdr:
createTable(row[:], cmdargs[:])
with open(r'test.csv','rb') as source:
rdr= csv.reader( source )
for row in rdr:
#Clear contents of list
outlist =[]
#Append all rows onto list and then write to row in output csv file
for index in cmdargs:
outlist.append(row[index])
insertRow(outlist[:])
我遇到的速度很慢可能是由于每次在 insertRow() 中创建与数据库的连接吗?