python - Python pandas：将所选列保留为 DataFrame 而不是 Series

Question

df.iloc[:, 0]当从 pandas DataFrame（例如、df['A']或等）中选择单列时df.A，生成的向量会自动转换为 Series 而不是单列 DataFrame。但是，我正在编写一些将 DataFrame 作为输入参数的函数。因此，我更喜欢处理单列 DataFrame 而不是 Series，以便函数可以假设 df.columns 是可访问的。现在我必须使用类似pd.DataFrame(df.iloc[:, 0]). 这似乎不是最干净的方法。有没有更优雅的方式直接从 DataFrame 索引，以便结果是单列 DataFrame 而不是 Series？

score 124 · Accepted Answer

正如@Jeff 提到的，有几种方法可以做到这一点，但我建议使用 loc/iloc 更明确（如果您尝试一些模棱两可的事情，请尽早提出错误）：

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
   A  B
0  1  2
1  3  4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]:  # they all return the same thing:
   A
0  1
1  3

后两种选择消除了整数列名称的歧义（正是为什么创建 loc/iloc）。例如：

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
   A  0
0  1  2
1  3  4

In [18]: df[[0]]  # ambiguous
Out[18]:
   A
0  1
1  3

score 9 · Accepted Answer

正如Andy Hayden所建议的那样，使用 .iloc/.loc 来索引（单列）数据框是可行的方法；还有一点需要注意的是如何表达索引位置。使用列出的索引标签/位置，同时指定要作为数据帧索引的参数值；否则将返回“pandas.core.series.Series”

输入：

    A_1 = train_data.loc[:,'Fraudster']
    print('A_1 is of type', type(A_1))
    A_2 = train_data.loc[:, ['Fraudster']]
    print('A_2 is of type', type(A_2))
    A_3 = train_data.iloc[:,12]
    print('A_3 is of type', type(A_3))
    A_4 = train_data.iloc[:,[12]]
    print('A_4 is of type', type(A_4))

输出：

    A_1 is of type <class 'pandas.core.series.Series'>
    A_2 is of type <class 'pandas.core.frame.DataFrame'>
    A_3 is of type <class 'pandas.core.series.Series'>
    A_4 is of type <class 'pandas.core.frame.DataFrame'>

score 4 · Accepted Answer

已经提到了这三种方法：

pd.DataFrame(df.loc[:, 'A'])  # Approach of the original post
df.loc[:,[['A']]              # Approach 2 (note: use iloc for positional indexing)
df[['A']]                     # Approach 3

pd.Series.to_frame() 是另一种方法。

因为它是一种方法，所以可以在上面第二种和第三种方法不适用的情况下使用。特别是，当您将某些方法应用于数据框中的列并且您希望将输出转换为数据框而不是系列时，它非常有用。例如，在 Jupyter Notebook 中，系列不会有漂亮的输出，但数据框会有。

# Basic use case: 
df['A'].to_frame()

# Use case 2 (this will give you pretty output in a Jupyter Notebook): 
df['A'].describe().to_frame()

# Use case 3: 
df['A'].str.strip().to_frame()

# Use case 4: 
def some_function(num): 
    ...

df['A'].apply(some_function).to_frame()

score 3 · Accepted Answer

您可以使用df.iloc[:, 0:1]，在这种情况下，结果向量将是 aDataFrame而不是系列。

如你看到的：

score 0 · Accepted Answer

（谈论熊猫1.3.4）

我想为涉及.to_frame(). 如果您选择数据框的单行并.to_frame()在其上执行，则索引将由原始列名组成，您将获得数字列名。您只需将 a.T添加到末尾即可将其转回原始数据框的格式（见下文）。

import pandas as pd
print(pd.__version__)  #1.3.4


df = pd.DataFrame({
    "col1": ["a", "b", "c"],
    "col2": [1, 2, 3]
})

# series
df.loc[0, ["col1", "col2"]]

# dataframe (column names are along the index; not what I wanted)
df.loc[0, ["col1", "col2"]].to_frame()
    #       0
    # col1  a
    # col2  1

# looks like an actual single-row dataframe.
# To me, this is the true answer to the question
# because the output matches the format of the
# original dataframe.
df.loc[0, ["col1", "col2"]].to_frame().T
    #   col1 col2
    # 0    a    1

# this works really well with .to_dict(orient="records") which is 
# what I'm ultimately after by selecting a single row
df.loc[0, ["col1", "col2"]].to_frame().T.to_dict(orient="records")
    # [{'col1': 'a', 'col2': 1}]

python - Python pandas：将所选列保留为 DataFrame 而不是 Series

5 回答 5

Related

Reference