1

我试图用 pandasql::sqldf 循环一个列表,但是这个 sqldf 似乎没有捕获循环变量。以下是我的问题的程式化大纲:

import pandas as pd
from pandasql import sqldf
from datetime import datetime

FreqGamePlay = pd.DataFrame({'CONTACT_WID' : [1, 2, 3, 1, 4], 
                         'TITLE_NOMIN_DT' : pd.to_datetime(['20130102', '20140103', '20120518', 
                                        '20140317', '20111123']),
                        'FreqGamePlay' : [12, 9, 22, 4, 5]})
FreqGamePlay = FreqGamePlay[['CONTACT_WID', 'TITLE_NOMIN_DT', 'FreqGamePlay']]

periodsList = ['2012-12-26', '2012-02-28']
for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;", globals())
    print(temp)

上面的程序给出了以下错误:

PandaSQLException: (sqlite3.OperationalError) no such column: i [SQL: 'select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;']

但如果我手动硬编码日期,它可以正常工作:

for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '2012-12-26' group by CONTACT_WID;", globals())
    print(temp)

但是上面的效率不高,因为实际程序的日期列表要大得多。任何建议表示赞赏,谢谢

4

1 回答 1

1

这是因为您将“i”变量直接包含在 SQL 字符串中,因此 Python 假定它是字符串的一部分并且变量不会被计算(您可以注意到在错误消息中 i 变量没有被它的值替换)。我建议您阅读一些有关使用 Python 字符串和变量的信息。在那之前,试试这个:

for i in periodsList:
    query = "select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '{}' group by CONTACT_WID;".format(i)
    temp = sqldf(query, globals())

花括号用作变量的占位符,而 format() 方法用于将占位符替换为变量值。

于 2018-02-08T14:51:23.857 回答