3

I am working on the power management of a system. The objectives that I am looking to minimize are power consumption and average latency. I have a single objective function having the linearly weighted sum of both the objectives:

C=w.P_avg+(1-w).L_avg,      where w belongs to (0,1)

I am using Q-learning to find a pareto-optimal trade-off curve by varying the weight w and setting different preference to power consumption and average latency. I do obtain a pareto-optimal curve. My objective, now, is to provide a constraint (e.g., average latency L_avg) and thus tuning/finding the value of w to meet the given criteria. Mine is an online algorithm, so the tuning of w should take place in an online fashion.

Could I be provided any hint or suggestions in this regard?

4

1 回答 1

2

社区中有一个多目标强化学习分支。

想法是1

为每个目标分配一组代理。将一个族中的智能体获得的解决方案与其他族中的智能体获得的解决方案进行比较。协商机制用于找到满足所有目标的折衷解决方案。

还有一篇你可能感兴趣的论文:

基于强化学习的电力系统调度和电压稳定多目标优化[J] .

不过,我没有找到它的公共网址。

于 2012-11-19T22:58:47.217 回答