2

我有三个熊猫数据框,其中包含在测试期间记录的数据。一个框架用于温度,另一个用于真空,另一个用于电压。

数据是独立捕获的,因此每帧的时间值不会对齐。只有偶尔来自一帧的时间戳在另一帧中有重复。

我想做的是将这些组合成一个数据框,然后插入缺失值,这样我就有了一个完整的数据框。

我是熊猫的新手并且一直在四处闲逛,但我不觉得我有任何地方,或者我什至在正确的道路上。

4

1 回答 1

6
import pandas as pd
import numpy as np

rng1 = pd.date_range(
    '1/1/2012', 
    periods=10, 
    freq='H'
)

s1 = pd.Series(
    np.arange(10),
    index=rng1
)

df1 = pd.DataFrame(
    {'temp': s1}
)

s2 = pd.Series(
    np.arange(5, 10),
    index=['1/1/2012 01:20:00',
           '1/1/2012 01:40:00',
           '1/1/2012 02:00:00',
           '1/1/2012 05:30:00',
           '1/1/2012 06:00:00']
)

df2 = pd.DataFrame(
    {'voltage': s2},
)

print df1
print df2 

--output:--
                     temp
2012-01-01 00:00:00     0
2012-01-01 01:00:00     1
2012-01-01 02:00:00     2
2012-01-01 03:00:00     3
2012-01-01 04:00:00     4
2012-01-01 05:00:00     5
2012-01-01 06:00:00     6
2012-01-01 07:00:00     7
2012-01-01 08:00:00     8
2012-01-01 09:00:00     9

                   voltage
1/1/2012 01:20:00        5
1/1/2012 01:40:00        6
1/1/2012 02:00:00        7
1/1/2012 05:30:00        8
1/1/2012 06:00:00        9


combined = df1.join(df2, how='outer')
print combined

--output:--
                     temp  voltage
2012-01-01 00:00:00     0      NaN
2012-01-01 01:00:00     1      NaN
2012-01-01 01:20:00   NaN        5
2012-01-01 01:40:00   NaN        6
2012-01-01 02:00:00     2        7
2012-01-01 03:00:00     3      NaN
2012-01-01 04:00:00     4      NaN
2012-01-01 05:00:00     5      NaN
2012-01-01 05:30:00   NaN        8
2012-01-01 06:00:00     6        9
2012-01-01 07:00:00     7      NaN
2012-01-01 08:00:00     8      NaN
2012-01-01 09:00:00     9      NaN

combined = combined.apply(
    pd.Series.interpolate, 
    args=('time',) 
)

print combined

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000       NaN
2012-01-01 01:00:00  1.000000       NaN
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000

print combined.fillna(method='backfill')

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000  5.000000
2012-01-01 01:00:00  1.000000  5.000000
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000
于 2013-09-26T10:53:01.897 回答