python-2.7 - 不计算熊猫数据框中所有列的总和

Question

我正在使用从 Impala 中提取数据impyla，并使用as_pandas. 我正在使用Pandas 0.18.0，Python 2.7.9

我正在尝试计算数据框中所有列的总和，并尝试选择大于阈值的列。

self.data = self.data.loc[:,self.data.sum(axis=0) > 15]

但是当我运行它时，我收到如下错误：

pandas.core.indexing.IndexingError：提供了不可对齐的布尔系列键

然后我尝试如下。

print 'length : ',len(self.data.sum(axis = 0)),' all columns : ',len(self.data.columns)

然后我得到不同的长度，即

长度：78 所有列：83

我得到了低于警告

C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare 没有返回 -1 或 -2 的异常

为了实现我的目标，我尝试了另一种方式

for column in self.data.columns:
    sum = self.data[column].sum()
    if( sum < 15 ):
        self.data = self.data.drop(column,1)

现在我得到了其他错误，如下所示：

TypeError：+ 的不支持的操作数类型：'Decimal' 和 'float' C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare 没有返回 -1 或 -2 异常

然后我尝试获取每一列的数据类型，如下所示。

print 'dtypes : ', self.data.dtypes

结果所有列都是int64 ， object 和 float 64 之一然后我想改变对象中列的数据类型，如下所示

self.data.convert_objects(convert_numeric=True)

我仍然遇到同样的错误，请帮助我解决这个问题。

注意：在所有列中，我都没有字符串，即字符和缺失值或为空。我已经使用self.data.to_csv

因为我是 pandas 和 python 的新手，所以请不要介意这是一个愚蠢的问题。我只想学习

score 0 · Accepted Answer

请查看下面的简单代码，您可能会了解错误的原因。

import pandas as pd
import numpy as np


df = pd.DataFrame(np.random.random([3,3]))
df.iloc[0,0] = np.nan

print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]

df.iloc[0,0] = 'string'

print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]

          0         1         2
0       NaN  0.336250  0.801349
1  0.930947  0.803907  0.139484
2  0.826946  0.229269  0.367627

0     True
1    False
2    False
dtype: bool

          0
0       NaN
1  0.930947
2  0.826946

          0         1         2
0    string  0.336250  0.801349
1  0.930947  0.803907  0.139484
2  0.826946  0.229269  0.367627

1    False
2    False
dtype: bool

Traceback (most recent call last):
...
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

很快，您需要对数据进行额外的预处理。

df.select_dtypes(include=['object'])

如果它是可转换的字符串数字，您可以将其转换为df.astype()，或者您应该清除它们。

python-2.7 - 不计算熊猫数据框中所有列的总和

1 回答 1

Related

Reference