我正在使用 Vowpal Wabbit 实现一个上下文强盗,用于动态定价,其中武器代表价格利润。成本/回报由价格 - 预期成本决定。成本最初是未知的,因此它是一个预测并且有可能发生变化。我的问题是,如果您的成本/回报会随着时间而变化,您能否更新成本/回报以反映实际成本并重新训练模型?
下面是一个带有 1 个特征(用户)和一个测试集的训练集的示例。成本基于预期的净收入。该模型经过训练并用于预测为测试集中的客户采取的行动。
import pandas as pd
import sklearn as sk
import numpy as np
from vowpalwabbit import pyvw
train_data = [{'action': 1, 'cost': -150, 'probability': 0.4, 'user': 'a'},
{'action': 3, 'cost': 0, 'probability': 0.2, 'user': 'b'},
{'action': 4, 'cost': -250, 'probability': 0.5, 'user': 'c'},
{'action': 2, 'cost': 0, 'probability': 0.3, 'user': 'a'},
{'action': 3, 'cost': 0, 'probability': 0.7, 'user': 'a'}]
train_df = pd.DataFrame(train_data)
# Add index to data frame
train_df['index'] = range(1, len(train_df) + 1)
train_df = train_df.set_index("index")
# Test data
test_data = [{'user': 'b'},
{'user': 'a'},
{'user': 'b'},
{'user': 'c'}]
test_df = pd.DataFrame(test_data)
# Add index to data frame
test_df['index'] = range(1, len(test_df) + 1)
test_df = test_df.set_index("index")
# Create python model and learn from each trained example
vw = pyvw.vw("--cb 4")
for i in train_df.index:
action = train_df.loc[i, "action"]
cost = train_df.loc[i, "cost"]
probability = train_df.loc[i, "probability"]
user = train_df.loc[i, "user"]
# Construct the example in the required vw format.
learn_example = str(action) + ":" + str(cost) + ":" + str(probability) + " | " + str(user)
# Here we do the actual learning.
vw.learn(learn_example)
# Predict actions
for j in test_df.index:
user = test_df.loc[j, "user"]
test_example = "| " + str(user)
choice = vw.predict(test_example)
print(j, choice)
但是,一周后我们收到了新信息,训练集中索引 0 的成本高于预期,而索引 2 的成本低于预期。这些新信息能否用于重新训练模型和预测动作?
## Reward/cost changed after 1 week once cost was realized
train_data = [{'action': 1, 'cost': 200, 'probability': 0.4, 'user': 'a'}, # Lost money
{'action': 3, 'cost': 0, 'probability': 0.2, 'user': 'b'},
{'action': 4, 'cost': -350, 'probability': 0.5, 'user': 'c'}, # Made more than exp.
{'action': 2, 'cost': 0, 'probability': 0.3, 'user': 'a'},
{'action': 3, 'cost': 0, 'probability': 0.7, 'user': 'a'}]