python - 序列化、pyBrain 中的分类、机器学习、预测

Question

我有这样的训练数据示例（我有 1000 部电影进行训练），我需要预测每部电影的“预算”：

film_1 = {
    'title': 'The Hobbit: An Unexpected Journey',
    'article_size': 25000,
    'producer': ['Peter Jackson', 'Fran Walsh', 'Zane Weiner'],
    'release_date': some_date(2013, 11, 28),
    'running_time': 169,
    'country': ['New Zealand', 'UK', 'USA'],
    'budget': dec('200000000')
}

'title', 'producer',等键'country'可以看作是机器学习中的特征，而 , 等值'The Hobbit: An Unexpected Journey'可以25000看作是用于学习过程的值。然而，在训练中，输入大多被接受为实数而不是字符串格式。我是否需要将诸如'title', 'producer', 'country'（字符串字段）之类的字段转换为int（应该进行分类或序列化之类的事情？）或其他一些操作，以使我能够将这些数据用作我的网络的训练集？

score 0 · Accepted Answer

我想知道这是否是您需要的：

film_list=['title','article_size','producer','release_date','running_time','country','budget']
flist = [(i,j) for i, j in enumerate(film_list)]
label = [ seq[0] for seq in flist ]
name = [ seq[1] for seq in flist ]
print label 
print name

>>[0, 1, 2, 3, 4, 5, 6]
['title', 'article_size', 'producer', 'release_date', 'running_time', 'country', 'budget']

或者你可以直接使用你的字典，

labels = film_1.keys()
print labels

# But the keys are sorted, labels[0] will give you 'producer' instead of 'title':
>>['producer', 'title', 'country', 'release_date', 'budget', 'article_size', 'running_time']

python - 序列化、pyBrain 中的分类、机器学习、预测

1 回答 1

Related

Reference