5

嗨,大家好,

几年来我一直在寻找 Stackoverflow,它对我帮助很大,以至于我以前从未注册过 :)

但是今天我遇到了一个问题,将 Python 与 Pandas 和 Quantities 一起使用(也可能是 unum 或 pint)。我尽力写一个清晰的帖子,但由于这是我的第一个帖子,如果有什么令人困惑的地方,我深表歉意,并会尝试纠正你会发现的任何错误:)


我想从源导入数据并构建 Pandas 数据框,如下所示:

import pandas as pd
import quantities as pq

depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m

s1 = pd.DataFrame(
        {'depth' : [x for x in depth]},
        index = depth)

这给出了:

S1=
     depth
0.0  0.0 m
1.1  1.1 m
2.0  2.0 m

现在我想将数据扩展到 depth2 值:(显然没有将深度插值超过深度的点,但在它变得更复杂之前这是一个测试)。

s2 = s1.reindex(depth2)

这给出了:

S2=
      depth
0.0   0.0 m
1.0   NaN
1.1   1.1 m
1.5   NaN
2.0   2.0 m

到目前为止没有问题。


但是当我尝试插入缺失值时:

s2['depth'].interpolate(method='values')

我收到以下错误:

C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right)
   1067         return compiled_interp([x], xp, fp, left, right).item()
   1068     else:
-> 1069         return compiled_interp(x, xp, fp, left, right)
  1070 
  1071 
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

我知道 numpy 的插值不适用于对象。


但是,如果我现在尝试通过删除单位来插入缺失值,它会起作用:

s3 = s2['depth'].astype(float).interpolate(method='values')

这给出了:

s3 = 
0.0   0
1.0   1
1.1   1.1
1.5   1.5
2.0   2
Name: depth, dtype: object

如何取回深度列中的单位?

我找不到任何技巧来放回设备...

任何帮助将不胜感激。谢谢

4

2 回答 2

2

这是一种做你想做的事的方法。

拆分数量并为每个数量创建一组 2 列

In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string],
                       index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ])

In [81]: df
Out[81]: 
     depth depth_unit
0.0    0.0          m
1.1    1.1          m
2.0    2.0          m

In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0])

In [83]: df
Out[83]: 
     depth depth_unit
0.0    0.0          m
1.0    NaN        NaN
1.1    1.1          m
1.5    NaN        NaN
2.0    2.0          m

In [84]: df['depth'] = df['depth'].interpolate(method='values')

宣传单位

In [85]: df['depth_unit'] = df['depth_unit'].ffill()

In [86]: df
Out[86]: 
     depth depth_unit
0.0    0.0          m
1.0    1.0          m
1.1    1.1          m
1.5    1.5          m
2.0    2.0          m
于 2013-10-08T16:39:55.353 回答
0

好的,我找到了一个解决方案,可能不是最好的,但对于我的问题,它工作得很好:

import pandas as pd
import quantities as pq

def extendAndInterpolate(input, newIndex):
""" Function to extend a panda dataframe and interpolate
"""
output = pd.concat([input, pd.DataFrame(index=newIndex)], axis=1)

for col in output.columns:
    # (1) Try to retrieve the unit of the current column
    try:
        # if it succeeds, then store the unit
        unit = 1 * output[col][0].units    
    except Exception, e:
        # if it fails, which means that the column contains string
        # then return 1
        unit = 1

    # (2) Check the type of value.
    if isinstance(output[col][0], basestring):
        # if it's a string return the string and fill the missing cell with this string
        value = output[col].ffill()
    else:
        # if it's a value, to be able to interpolate, you need to:
        #   - (a) dump the unit with astype(float)
        #   - (b) interpolate the value
        #   - (c) add again the unit
        value = [x*unit for x in output[col].astype(float).interpolate(method='values')]
    #
    # (3) Returned the extended pandas table with the interpolated values    
    output[col] = pd.Series(value, index=output.index)
# Return the output dataframe
return output

然后:

depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m

s1 = pd.DataFrame(
        {'depth' : [x for x in depth]},
        index = depth)

s2 = extendAndInterpolate(s1, depth2)

结果:

s1
     depth
0.0  0.0 m
1.1  1.1 m
2.0  2.0 m

s2     
     depth
0.0  0.0 m
1.0  1.0 m
1.1  1.1 m
1.5  1.5 m
2.0  2.0 m

谢谢你的帮助。

于 2013-10-15T12:07:34.557 回答