python - 在 Python / Pandas 中计算两行之间的差异

Question

在 python 中，我如何引用前一行并针对它计算一些东西？具体来说，我正在使用dataframes-pandas我有一个充满股票价格信息的数据框，如下所示：

           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

这是我创建此数据框的方式：

import pandas

url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
data = data = pandas.read_csv(url)

## now I sorted the data frame ascending by date 
data = data.sort(columns='Date')

从第 2 行开始，或者在这种情况下，我猜它是 250（PS - 那是索引吗？），我想计算 2011-01-03 和 2011-01-04 之间的差异，对于这个数据框中的每个条目. 我相信适当的方法是编写一个获取当前行的函数，然后计算前一行，并计算它们之间的差异，使用该pandas apply函数用该值更新数据框。

这是正确的方法吗？如果是这样，我应该使用索引来确定差异吗？（注意 - 我仍然处于 python 初学者模式，所以 index 可能不是正确的术语，甚至不是正确的实现方式）

score 115 · Accepted Answer

我想你想做这样的事情：

In [26]: data
Out[26]: 
           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

In [27]: data.set_index('Date').diff()
Out[27]: 
            Close  Adj Close
Date                        
2011-01-03    NaN        NaN
2011-01-04   0.16       0.16
2011-01-05  -0.59      -0.58
2011-01-06   1.61       1.57
2011-01-07  -0.73      -0.71

score 18 · Accepted Answer

计算一列的差异。这是你可以做的。

df=
      A      B
0     10     56
1     45     48
2     26     48
3     32     65

我们只想计算 A 中的行差异，并想考虑小于 15 的行。

df['A_dif'] = df['A'].diff()
df=
          A      B      A_dif
    0     10     56      Nan
    1     45     48      35
    2     26     48      19
    3     32     65      6
df = df[df['A_dif']<15]

df=
          A      B      A_dif
    0     10     56      Nan
    3     32     65      6

score 1 · Accepted Answer

我不知道 pandas，而且我很确定它有一些特定的东西；但是，我会给你纯 Python 解决方案，即使你需要使用 pandas，它也可能会有所帮助：

import csv
import urllib

# This basically retrieves the CSV files and loads it in a list, converting
# All numeric values to floats
url='http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
reader = csv.reader(urllib.urlopen(url), delimiter=',')
# We sort the output list so the records are ordered by date
cleaned = sorted([[r[0]] + map(float, r[1:]) for r in list(reader)[1:]])

for i, row in enumerate(cleaned):  # enumerate() yields two-tuples: (<id>, <item>)
    # The try..except here is to skip the IndexError for line 0
    try:
        # This will calculate difference of each numeric field with the same field
        # in the row before this one
        print row[0], [(row[j] - cleaned[i-1][j]) for j in range(1, 7)]
    except IndexError:
        pass

python - 在 Python / Pandas 中计算两行之间的差异

3 回答 3

Related

Reference