python-3.x - PandaSQL 很慢

Question

出于数据分析的目的，我目前正在从 R 切换到 Python (anconda/Spyder Python 3)。在 RI 中使用了很多 R sqldf。由于我擅长 sql 查询，所以我不想重新学习 data.table 语法。使用 R sqldf，我从来没有遇到过性能问题。

现在，在 Python 中，我尝试使用 pandasql 一个简单的df = "SELECT * From table LIMIT 1"将永远持续 193k 行，19 列。

我尝试了 pysqldf 但我收到一个错误，说该表不存在，但它确实存在。

# -*- coding: utf-8 -*-

import pandas as pd
import pandasql 
import pysqldf

#Data loading    
orders = pd.read_csv('data/orders.csv',sep = ';')

###### PANDASQL ######
test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",globals())
# Will last several minutes and use a lot of RAM

test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",locals())
# Will last several minutes and use a lot of RAM


###### PYSQLDF ######
sqldf = pysqldf.SQLDF(globals())
test = sqldf.execute("SELECT  * from orders LIMIT 1;")
#error
#Error for pysqldf

Traceback (most recent call last):

  File "<ipython-input-12-30b645117dc4>", line 1, in <module>
    test = sqldf.execute("SELECT  * from orders LIMIT 1;")

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 76, in execute
    self._del_table(tables)

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 117, in _del_table
    self.conn.execute("drop table " + tablename)

OperationalError: no such table: orders

我错过了什么吗？在“学习熊猫查询语法”之前更喜欢 pandasql/pysqldf 答案。

R 中的 Sqldf 在 i7/12G ram 笔记本电脑上对多达 1000 万行的表进行复杂查询。

谢谢！

score 2 · Accepted Answer

好的，刚刚找到了解决方案。

完全放弃了 Anaconda 安装。
清理相关文件夹。
使用 PIP 从头开始安装 Python 3.6。
然后pip安装pandas、pandasql。
启动了我的脚本。不到一秒执行的脚本（pandasql）

python-3.x - PandaSQL 很慢

1 回答 1

Related

Reference