machine-learning - 在 GraphLab Sframe 中过滤和显示值？

Question

因此，一周前我开始使用 Graphlab 来参加我的机器学习课程。我对 Graphlab 还是很陌生，我通读了 API，但无法完全得到我正在寻找的解决方案。所以，这就是问题所在。我的数据包含多列，例如卧室、浴室、平方英尺、邮政编码等。这些基本上是特征，我的目标是使用各种 ML 算法来预测房屋价格。现在，我应该找到邮政编码为 93038 的房屋的平均价格。所以，我把这个问题分解成更小的部分，因为我很天真，决定使用我的直觉。这是我到目前为止所尝试的。首先，我试图找到一种方法来创建一个过滤器，这样我就可以只提取带有邮政编码 - 93038 的房子的价格。

import graphlab
sf = graphlab.SFrame('home_data.gl')
sf[(sf['zipcode']=='93038')]

这些向我展示了邮政编码为 93038 的所有列，但我只想显示价值为 93038 的价格和邮政编码列。我尝试了很多不同的方法，但就是想不通。

另外，假设我想找到邮政编码值为 93038 的价格平均值。我该怎么做？

提前致谢。

score 6 · Accepted Answer

你可以试试：

import graphlab as gl
sf = gl.SFrame({'price':[1,4,2],'zipcode':['93038','93038','93037']})

# Filtering
filter_sf = sf[(sf['zipcode']=='93038')] 

# Displaying
print filter_sf[['price', 'zipcode']]

# Averaging a column
print filter_sf['price'].mean()

score 1 · Accepted Answer

使用 GroupBy 操作和 topk() 函数

import graphlab.aggregate as agg
sf_ = sf.groupby(key_columns = 'zipcode', operations={'Mean by ZipCode' : agg.MEAN('price')})
sf_.topk('Mean by ZipCode', k=1)

score 0 · Accepted Answer

这是我所做的：

- 第一个选项

sf[sf['zipcode']=='98039']['price'].mean()

- 第二个选项

zip = ['98039'] *#create your variable with the zipcode you want*

m_price = sf.filter_by(zip, 'zipcode') *#you filter the column 'zipcode' by your zipcode*

print m_price['price'].mean() *#print the mean of the zipcode*

score 0 · Accepted Answer

mean_by_zip = sales.groupby(key_columns=['zipcode'], 
       operations={'avg': graphlab.aggregate.MEAN('price')})

mean_by_zip.sort('avg', ascending=False)[0:3] # will give top 3

machine-learning - 在 GraphLab Sframe 中过滤和显示值？

4 回答 4

Related

Reference