python - 列表理解循环

Question

我有一个包含日期、时间、价格、大小、信号的 csv 文件。62035行；一天中有 42 次与文件中的每个唯一日期相关联。

对于每个日期，当信号栏中有“S”时，附加“S”发生时的相应价格。下面是尝试。

from pandas import *
from numpy import *
from io import *
from os import *
from sys import *

DF1 = read_csv('___.csv')
idf=DF1.set_index(['date','time','price'],inplace=True)
sStore=[]
for i in idf.index[i][0]:
  sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])
sStore.head()

Traceback (most recent call last)
<ipython-input-7-8769220929e4> in <module>()
  1 sStore=[]
  2 
----> 3 for time in idf.index[i][0]:
  4 
  5     sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])

 NameError: name 'i' is not defined

我不明白为什么这里不允许使用 i 索引。谢谢。

我也觉得奇怪的是：

idf.index.levels[0] 将显示“未解析”的日期，因为它在文件中但无序。尽管 parse_date=True 作为 set_index 中的参数。

我提出这个问题是因为我正在考虑用以下方式解决问题：

for i in idf.index.levels[0]:

   sStore.append([idf.index[j][2] for j in idf.index.levels[1] if idf['signal']=='S'])

 sStore.head()

我在 2012 年 12 月 30 日根据 DSM 的以下评论进行编辑：

正如我在下面评论的那样，我想用你的想法来获得损益表。如果 S!=B，对于任何给定日期，我们使用结束时间 1620 来区分。

v=[df["signal"]=="S"]
t=[df["time"]=="1620"]
u=[df["signal"]!="S"]

df["price"][[v and (u and t)]]

也就是说，“给我 1620 的价格；（即使它没有给出“卖出信号”，S），以便我可以区分“额外的 B”——对于 B>S 的特殊情况。这忽略了对称问题（其中 S>B），但现在我想了解这个逻辑问题。

在回溯时，此表达式给出：

ValueError: boolean index array should have 1 dimension

请注意，为了调用 df["time']我没有在这里设置索引。尝试联合运算符 | 给出：

TypeError: unsupported operand type(s) for |: 'list' and 'list'

看看 Max Fellows 的方法，

@Max 研究员

关键是在一天结束时平仓；所以我们需要在收盘时捕捉到价格以“卸载”所有累积的 B、S；但没有互相清除。如果我说：

filterFunc1 = lambda row: row["signal"] == "S" and ([row["signal"] != "S"][row["price"]=="1620"])
filterFunc2 =lambda row: ([row["price"]=="1620"][row["signal"] != "S"])

filterFunc=filterFunc1 and filterFunc2

filteredData = itertools.ifilter(filterFunc, reader)

追溯：

IndexError: list index out of range

score 2 · Accepted Answer

尝试这样的事情：

for i in range(len(idf.index)):
  value = idf.index[i][0]

j使用索引变量进行迭代也是如此。正如已经指出的那样，您不能在要迭代的表达式中引用迭代索引，此外，您还需要执行非常具体的迭代（遍历矩阵中的列），并且 Python 的默认迭代器将无法正常工作为此，需要自定义索引处理。

score 2 · Accepted Answer

这就是我认为您正在根据您的编辑尝试完成的工作：对于 CSV 文件中的每个日期，将日期与带有“S”信号的每个项目的价格列表一起分组。

您的问题中没有包含任何示例数据，所以我做了一个测试，希望与您描述的格式相匹配：

12/28/2012,1:30,10.00,"foo","S"
12/28/2012,2:15,11.00,"bar","N"
12/28/2012,3:00,12.00,"baz","S"
12/28/2012,4:45,13.00,"fibble","N"
12/28/2012,5:30,14.00,"whatsit","S"
12/28/2012,6:15,15.00,"bobs","N"
12/28/2012,7:00,16.00,"widgets","S"
12/28/2012,7:45,17.00,"weevils","N"
12/28/2012,8:30,18.00,"badger","S"
12/28/2012,9:15,19.00,"moose","S"
11/29/2012,1:30,10.00,"foo","N"
11/29/2012,2:15,11.00,"bar","N"
11/29/2012,3:00,12.00,"baz","S"
11/29/2012,4:45,13.00,"fibble","N"
11/29/2012,5:30,14.00,"whatsit","N"
11/29/2012,6:15,15.00,"bobs","N"
11/29/2012,7:00,16.00,"widgets","S"
11/29/2012,7:45,17.00,"weevils","N"
11/29/2012,8:30,18.00,"badger","N"
11/29/2012,9:15,19.00,"moose","N"
12/29/2012,1:30,10.00,"foo","N"
12/29/2012,2:15,11.00,"bar","N"
12/29/2012,3:00,12.00,"baz","S"
12/29/2012,4:45,13.00,"fibble","N"
12/29/2012,5:30,14.00,"whatsit","N"
12/29/2012,6:15,15.00,"bobs","N"
12/29/2012,7:00,16.00,"widgets","S"
12/29/2012,7:45,17.00,"weevils","N"
12/29/2012,8:30,18.00,"badger","N"
12/29/2012,9:15,19.00,"moose","N"
8/9/2008,1:30,10.00,"foo","N"
8/9/2008,2:15,11.00,"bar","N"
8/9/2008,3:00,12.00,"baz","S"
8/9/2008,4:45,13.00,"fibble","N"
8/9/2008,5:30,14.00,"whatsit","N"
8/9/2008,6:15,15.00,"bobs","N"
8/9/2008,7:00,16.00,"widgets","S"
8/9/2008,7:45,17.00,"weevils","N"
8/9/2008,8:30,18.00,"badger","N"
8/9/2008,9:15,19.00,"moose","N"

这是一种使用 Python 2.7 和内置库的方法，可以按照您想要的方式对其进行分组：

import csv
import itertools
import time
from collections import OrderedDict

with open("sample.csv", "r") as file:
    reader = csv.DictReader(file,
                            fieldnames=["date", "time", "price", "mag", "signal"])

    # Reduce the size of the data set by filtering out the non-"S" rows.
    filterFunc = lambda row: row["signal"] == "S"
    filteredData = itertools.ifilter(filterFunc, reader)

    # Sort by date so we can use the groupby function.
    dateKeyFunc = lambda row: time.strptime(row["date"], r"%m/%d/%Y")
    sortedData = sorted(filteredData, key=dateKeyFunc)

    # Group by date: create a new dictionary of date to a list of prices.
    datePrices = OrderedDict((date, [row["price"] for row in rows])
                             for date, rows
                             in itertools.groupby(sortedData, dateKeyFunc))

for date, prices in datePrices.iteritems():
    print "{0}: {1}".format(time.strftime(r"%m/%d/%Y", date),
                            ", ".join(str(price) for price in prices))

>>> 08/09/2008: 12.00, 16.00
>>> 11/29/2012: 12.00, 16.00
>>> 12/28/2012: 10.00, 12.00, 14.00, 16.00, 18.00, 19.00
>>> 12/29/2012: 12.00, 16.00

类型转换取决于您，因为您可能正在使用其他库来读取 CSV，但这应该可以帮助您入门 - 并注意@DSM 关于 import *.

score 2 · Accepted Answer

使用@Max Fellows 方便的示例数据，我们可以在pandas. [顺便说一句，您应该始终尝试提供一个简短、独立、正确的示例（有关详细信息，请参见此处），以便尝试帮助您的人不必花时间想出一个。]

首先，import pandas as pd。然后：

In [23]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())

In [24]: df.set_index(["date", "time"], inplace=True)

这给了我

In [25]: df
Out[25]: 
                 price      mag signal
date       time                       
12/28/2012 1:30     10      foo      S
           2:15     11      bar      N
           3:00     12      baz      S
           4:45     13   fibble      N
           5:30     14  whatsit      S
           6:15     15     bobs      N
           7:00     16  widgets      S
           7:45     17  weevils      N
           8:30     18   badger      S
           9:15     19    moose      S
11/29/2012 1:30     10      foo      N
           2:15     11      bar      N
           3:00     12      baz      S
           4:45     13   fibble      N
           5:30     14  whatsit      N
           6:15     15     bobs      N
           7:00     16  widgets      S
           7:45     17  weevils      N
           8:30     18   badger      N
           9:15     19    moose      N
[etc.]

我们可以很容易地看到哪些行有信号S：

In [26]: df["signal"] == "S"
Out[26]: 
date        time
12/28/2012  1:30     True
            2:15    False
            3:00     True
            4:45    False
            5:30     True
            6:15    False
[etc..]

我们也可以选择使用它：

In [27]: df["price"][df["signal"] == "S"]
Out[27]: 
date        time
12/28/2012  1:30    10
            3:00    12
            5:30    14
            7:00    16
            8:30    18
            9:15    19
11/29/2012  3:00    12
            7:00    16
12/29/2012  3:00    12
            7:00    16
8/9/2008    3:00    12
            7:00    16
Name: price

这是一个DataFrame在每个日期、时间和价格都有S. 如果你只是想要一个列表：

In [28]: list(df["price"][df["signal"] == "S"])
Out[28]: [10.0, 12.0, 14.0, 16.0, 18.0, 19.0, 12.0, 16.0, 12.0, 16.0, 12.0, 16.0]

更新：

v=[df["signal"]=="S"]制作v一个list包含Series. 那不是你想要的。 df["price"][[v and (u and t)]]对我来说也没有多大意义——v并且u是相互排斥的，所以如果你和他们在一起，你什么也得不到。对于这些逻辑向量运算，您可以使用&and|代替andand or。再次使用参考数据：

In [85]: import pandas as pd

In [86]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())

In [87]: v=df["signal"]=="S"

In [88]: t=df["time"]=="4:45"

In [89]: u=df["signal"]!="S"

In [90]: df[t]
Out[90]: 
          date  time  price     mag signal
3   12/28/2012  4:45     13  fibble      N
13  11/29/2012  4:45     13  fibble      N
23  12/29/2012  4:45     13  fibble      N
33    8/9/2008  4:45     13  fibble      N

In [91]: df["price"][t]
Out[91]: 
3     13
13    13
23    13
33    13
Name: price

In [92]: df["price"][v | (u & t)]
Out[92]: 
0     10
2     12
3     13
4     14
6     16
8     18
9     19
12    12
13    13
16    16
22    12
23    13
26    16
32    12
33    13
36    16
Name: price

[注：这个问题现在变得太长太曲折了。我建议花一些时间pandas在控制台上阅读文档中的示例以了解它。]

score 1 · Accepted Answer

这是因为i尚未定义，就像错误消息所说的那样。

在这一行：

for i in idf.index[i][0]:

您正在告诉 Python 解释器遍历从表达式返回的列表产生的所有值，idf.index[i][0]但您尚未定义是什么i（尽管您也尝试将列表中的每个项目设置为变量i）。

Pythonfor ... in ...循环的工作方式是它采用最正确的组件并next从迭代器中请求项目。然后它将调用产生的值分配给左侧提供的变量名称。

python - 列表理解循环

4 回答 4

Related

Reference