1

我知道如何在 pandas 中以各种方式连接表 - concat、merge 等,但我也想知道如何使用 pandasql 来做到这一点。具体来说,我想在索引上加入两个熊猫数据框。这可能吗?当我做

new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.key=b.key;")

我得到正确的结果。(我在两个表上都有一个“关键”变量。)但是,当我尝试

new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;")

我明白了

---------------------------------------------------------------------------
PandaSQLException                         Traceback (most recent call last)
<ipython-input-154-ecab230d4dc9> in <module>()
----> 1 new_df = pysqldf("SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;")

<ipython-input-100-adc122e97ed8> in <lambda>(q)
      1 from pandasql import sqldf
----> 2 pysqldf = lambda q: sqldf(q, globals())

/Users/jwesley/anaconda/lib/python2.7/site-packages/pandasql/sqldf.pyc in sqldf(query, env, db_uri)
    154     >>> sqldf("select avg(x) from df;", locals())
    155     """
--> 156     return PandaSQL(db_uri)(query, env)

/Users/jwesley/anaconda/lib/python2.7/site-packages/pandasql/sqldf.pyc in __call__(self, query, env)
     61                 result = read_sql(query, conn)
     62             except DatabaseError as ex:
---> 63                 raise PandaSQLException(ex)
     64             except ResourceClosedError:
     65                 # query returns nothing

PandaSQLException: (sqlite3.OperationalError) near "index": syntax error [SQL: 'SELECT a.*, b.list3 from df1 as a INNER JOIN df2 as b ON a.index=b.index;']
4

1 回答 1

0

只需命名索引,然后您就可以按sql 查询中df1.index.rename('foo', inplace=True)命名的列引用索引。'foo'

那是因为 pandasql 会检查是否设置了索引名称:

来自https://github.com/yhat/pandasql/blob/a6b7ac405ef741400221600d6769faaf1bdbc6ab/pandasql/sqldf.py#L121

def write_table(df, tablename, conn):
    """ Write a dataframe to the database. """
    with catch_warnings():
        filterwarnings('ignore',
                       message='The provided table name \'%s\' is not found exactly as such in the database' % tablename)
        to_sql(df, name=tablename, con=conn,
               index=not any(name is None for name in df.index.names))  # load index into db if all levels are named

注意:我尝试将索引重命名为“索引”,但查询失败。但它成功设置了其他索引名称。也许“索引”是SQLite 中的关键字

或者您可以添加一个与索引相同的新列:df1['index'] = df1.index

于 2016-09-16T21:04:44.810 回答