7

一些R 数据集可以很容易地加载到 Pandas DataFrame 或 Panel 中

import pandas.rpy.common as com
infert = com.load_data('infert')
print(infert.head())

只要 R 数据集的维度 <= 3,这似乎就可以工作。更高维度的数据集会打印一条错误消息:

In [67]: com.load_data('Titanic')
Cannot handle dim=4

此错误消息源自rpy/common.py _convert_array函数。

当然,Pandas 不能直接将 4 维矩阵硬塞到 DataFrame 或 Panel 中是有道理的,但是是否有一些解决方法可以将数据集加载Titanic到 DataFrame 中(可能带有分层索引)?

4

2 回答 2

6

使用@joran 的非常有用的建议,在安装reshape包之后

% sudo R
R> install.packages('reshape')

我设法将Titanic数据集加载到 Pandas DataFrame 中:

import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro

r = ro.r
r('library(reshape)')
df = com.convert_robj(r('melt(Titanic)'))
print(df.head())

哪个打印

  Class     Sex    Age Survived  value
1   1st    Male  Child       No      0
2   2nd    Male  Child       No      0
3   3rd    Male  Child       No     35
4  Crew    Male  Child       No      0
5   1st  Female  Child       No      0
于 2013-09-26T22:10:15.380 回答
1

使用Pandas 0.13.0 或更高版本pandas.rpy.common.load_data可以加载更高维度的数据集,例如Titanic

import pandas.rpy.common as com
df = com.load_data('Titanic')
print(df.head())

产量

  Survived    Age     Sex Class value
0       No  Child    Male   1st   0.0
1       No  Child    Male   2nd   0.0
2       No  Child    Male   3rd  35.0
3       No  Child    Male  Crew   0.0
4       No  Child  Female   1st   0.0
于 2014-01-21T14:57:47.677 回答