1- 在 Flow 中,为了处理多智能体的情况,在某些方法(例如 in get_state()
)中,我们不是将单个智能体的状态信息作为 annp.array
返回,而是返回一个状态字典(agent_id
作为键和agent_state
作为字典的值) .
所以你可以做这样的事情:
def get_state(self):
agent_state_dict = {}
i = 0
for intersection, edges in self.scenario.get_node_mapping():
i = i + 1
agent_id = self.agent_name_prefix + str(i) # self.agent_name_prefix is defined as string "intersection"
speeds = []
dist_to_intersec = []
traffic_light_states = []
..... code .....
# construct the state (observation) for each agent
observation = np.array(
np.concatenate([
speeds, dist_to_intersec, traffic_light_states
# each intersection is an agent, so we will make a dictionary that maps form "self.agent_name_prefix+'i'" to the state of that agent.
agent_state_dict.update({agent_id: observation})
return agent_state_dict
Theagent_state_dict
是映射agent_id
到“观察”(即状态)的字典
2- 现在回答您的第二个问题,将交叉口定义为代理(因此您将拥有多代理场景),您需要做的就是为交叉口定义相应的 RLlib 函数(get_state
、action_space
、observation_space
、compute_reward
和_apply_rl_actions
)。如果你这样做,你将拥有一个完整的多代理环境。