python - 如何将 json 数据定义为 X 和 Y sklearn 决策树数组

Question

假设我的数据由水果组成，由它们的颜色和形状以及更多具有任意值的特征（纹理大小、果皮类型等）来描述。

如何在 sklearn.tree 中构建决策树。X定义的样本和特征数组是什么，Y是什么。对于我使用mongodb的数据库，因此数据集在json中：

{"_id":2323, "shape":"round", "color":[red,green], "texture":"A", "pill":"X", "more":[1,2,3]}

{"_id":2324, "shape":"round", "color":[orange], "texture":"C", "pill":"", "more":[1,2]}

是否有将数据拟合/转换为在 sklearn.tree 中构建决策树所需的 python 数据类型的教程？

谢谢！

score 2 · Accepted Answer

编辑：我刚刚注意到您的 json 构造中有嵌套结构。和DictVectorizer类FeatureHasher都期望平面字典作为输入。你可以扁平化你自己的构造，例如：

{"_id": 2323, "shape": "round", "color/red": 1 "color/green": 1, "texture": "A",
 "pill": "X", "more/1": 1, "more/2": 1, "more/3": 1}

然后调用DictVectorizer或FeatureHasher在这样的平面 python dicts 列表上。

1 回答 1