I am working on the power management of a system. The objectives that I am looking to minimize are power consumption and average latency. I have a single objective function having the linearly weighted sum of both the objectives:
C=w.P_avg+(1-w).L_avg, where w belongs to (0,1)
I am using Q-learning to find a pareto-optimal trade-off curve by varying the weight w and setting different preference to power consumption and average latency. I do obtain a pareto-optimal curve. My objective, now, is to provide a constraint (e.g., average latency L_avg) and thus tuning/finding the value of w to meet the given criteria. Mine is an online algorithm, so the tuning of w should take place in an online fashion.
Could I be provided any hint or suggestions in this regard?