我使用 pandas.to_sql() 将数据帧插入 PostgreSQL。当我在 localhost 中运行代码并且也在 localhost 中运行数据库时,大约需要 5 分钟才能完成。但是当代码和数据库不在同一台计算机上时,大约需要 40 分钟才能完成。
两台计算机具有相同的配置和硬件、软件,并且它们在同一个局域网中,通过一个siwtch连接。请帮我!
version:
pandas: 0.21.0
psycopg2: 2.7.3.2
PostgreSQL: 9.5
代码如下:
HOST = '192.168.1.18'
USER = 'test'
PASSWORD = '123'
DATABASE = 'test'
import datetime as dt
import numpy as np
from sqlalchemy import create_engine
import sqlalchemy
engine = create_engine("postgresql://%s:%s@%s:5432/%s" % (USER, PASSWORD, HOST, DATABASE))
import time
st = time.time()
sourceData.to_sql('test', con=engine, if_exists='append', index=True, chunksize= 10000, dtype={'DATETIME': sqlalchemy.types.TIMESTAMP, 'CODES': sqlalchemy.types.VARCHAR(255)})
et = time.time()
print((et-st)/60)
HOST = '192.168.1.19'
USER = 'postgres'
PASSWORD = '123'
DATABASE = 'postgres'
engine = create_engine("postgresql://%s:%s@%s:5432/%s" % (USER, PASSWORD, HOST, DATABASE))
st = time.time()
sourceData.to_sql('test', con=engine, if_exists='append', index=True, chunksize= 10000, dtype={'DATETIME': sqlalchemy.types.TIMESTAMP, 'CODES': sqlalchemy.types.VARCHAR(255)})
et = time.time()
print((et-st)/60)
sourceData 是一个 pandas.DataFrame,它有 3000000 行,如下所示:
Q_C81
DATETIME CODES
2013-01-04 000001.SZ 0.1828
000002.SZ 0.1150
000004.SZ 0.0000
000005.SZ 0.0000
000006.SZ -1.5936
000007.SZ -1.9031
000008.SZ 0.0000
000009.SZ -74.5152
000010.SZ 0.0000
000011.SZ 0.0000
000012.SZ -7.0324
000014.SZ 0.0000
000016.SZ 2.5925
000017.SZ 0.0000
000018.SZ 0.0000
000019.SZ 0.0000
000020.SZ 0.0000
000021.SZ -82.1918
000022.SZ -2.3582
000023.SZ 0.0000
000024.SZ -0.2810
000025.SZ 0.0000
000026.SZ -5.3294
000027.SZ -1.7320
000028.SZ 1.2884
000029.SZ 0.0000
000030.SZ 0.0000
000031.SZ 0.6957
000032.SZ 0.0000
000033.SZ 0.0000
...
2013-03-01 002621.SZ -10.9103
002622.SZ -50.2930
002623.SZ -35.4200
002624.SZ -34.8826
002625.SZ -36.2222
002626.SZ -0.2656
002627.SZ -13.5603
002628.SZ 9.0788