python - 创建用于 ggPlot 线图的 Pandas DataFrame

Question

我正在尝试创建一个 Pandas 数据框，以便可以使用 ggPlot 创建一些可视化。但是我很难设置 DataFrame 结构。

我的可视化将是（年份与总计）的线图。多年来，线图将跟踪多个“cause_of_death”。

我已经导入了我的 CSV 文件，按年份分组，然后是“cause_of_death”并进行计数。但是创建线图的格式不正确，因为它不是 DataFrame。

以下是我的代码；任何建议都会有所帮助，谢谢。

我想要的 CSV 文件中的字段是“deathYear”和“cause_of_death”

from pandas import * 
from ggplot import *

df = pandas.read_csv('query_result.csv')

newDF = df.loc[:,['date_of_death_year','acme_underlying_cause_code']]
data = DataFrame(newDF.groupby(['date_of_death_year','acme_underlying_cause_code']).size())

print data

score 1 · Accepted Answer

这是一个古老的问题，但它很容易解决。（提示，这与它无关ggplot。这都是关于如何pandas工作的）

以下是我如何呈现您的代码：

import numpy as np   # |Don't import * from these
import pandas as pd  # |
from ggplot import * # But this is customary because it's like R

# All this bit is just to make a DataFrame
# You can ignore it all
causes = ['foo', 'bar', 'baz']
years = [2001, 2002, 2003, 2004]
size = 100
data = {'causes':np.random.choice(causes, size),
        'years':np.random.choice(years, size),
        'something_else':np.random.random(size)
        }
df = pd.DataFrame(data)

# Here's where the good stuff happens. You're importing from
# a CSV so you can just start here
counts = df.groupby(['years', 'causes'])['something_else'].count()
counts = counts.reset_index() # Because ggplot doesn't plot with indexes
g = ggplot(counts, aes(x='years', y='something_else', color='causes')) +\
        geom_line()
print(g)

结果是： ggplot 多线图

python - 创建用于 ggPlot 线图的 Pandas DataFrame

1 回答 1

Related

Reference