1

我开始学习DQN,并且我正在尝试自己FrozenLake-v0从头开始解决问题并使用Pytorch,因此我将放置整个代码,因为它已连接。

class LinearDeepQNetwork(nn.Module):
  def __init__(self,lr,n_action,input_dim):
    super(LinearDeepQNetwork,self).__init__()
    self.f1=nn.Linear(input_dim,128)
    self.f2=nn.Linear(128,n_action)
    self.optimizer=optim.Adam(self.parameters(),lr=lr)
    self.loss=nn.MSELoss()
    self.device=T.device('cuda' if T.cuda.is_available() else 'cpu')
    self.to(self.device)

  def forward(self,state):
    layer1=F.relu(self.f1(state))
    actions=self.f2(layer1)

    return actions

第二类是代理,问题出在学习函数上

nd 类是代理,问题出在学习函数上

class Agent():
  def __init__(self,input_dim,n_action,lr,gamma=0.99,
               epslion=1.0,eps_dec=1e-5,eps_min=0.01):
    self.input_dim=input_dim
    self.n_action=n_action
    self.lr=lr
    self.gamma=gamma
    self.epslion=epslion
    self.eps_dec=eps_dec
    self.eps_min=eps_min
    self.action_space=[i for i in range(self.n_action)]

    self.Q=LinearDeepQNetwork(self.lr,self.n_action,self.input_dim)
  
  def choose_action(self,observation):
    if np.random.random()>self.epslion:
      #conveate the state into tensor
      state=T.tensor(observation).to(self.Q.device)
      actions=self.Q.forward(state)
      action=T.argmax(actions).item()
    else:
      action=np.random.choice(self.action_space)

    return action
  
  def decrement_epsilon(self):
    self.epslion=self.epslion-self.eps_dec \
                  if self.epslion > self.eps_min else self.eps_min
                  
  def OH(self,x,l):
    x = T.LongTensor([[x]])
    one_hot = T.FloatTensor(1,l)
    return one_hot.zero_().scatter_(1,x,1)

  def learn(self,state,action,reward,state_):
    self.Q.optimizer.zero_grad()
    states=Variable(self.OH(state,16)).to(self.Q.device)
    actions=T.tensor(action).to(self.Q.device)
    rewards=T.tensor(reward).to(self.Q.device)
    state_s=Variable(self.OH(state_,16)).to(self.Q.device)

    q_pred=self.Q.forward(states)[actions]
    
    q_next=self.Q.forward(state_s).max()

    q_target=reward+self.gamma*q_next
    loss=self.Q.loss(q_target,q_pred).to(self.Q.device)
    loss.backward()
    self.Q.optimizer.step()
    self.decrement_epsilon()

现在当我运行以下代码时出现问题,它在学习阶段给我一个错误,它给了我这个错误index 1 is out of bounds for dimension 0 with size 1.

env=gym.make('FrozenLake-v0')    
n_games=5000
scores=[]
eps_history=[]


agent=Agent(env.observation_space.n,env.action_space.n,0.0001)

for i in tqdm(range(n_games)):
  score=0
  done=False
  obs=env.reset()

  while not done:
    action=agent.choose_action(obs)
    obs_,reward,done,_=env.step(action)
    score+=reward
    
    agent.learn(obs,action,reward,obs_)
    obs=obs_
  scores.append(score)
  eps_history.append(agent.epslion)
  if i % 100 ==0:
    avg_score=np.mean(scores[-100:])
    print(f'score={score}  avg_score={avg_score} epsilon={agent.epslion} i={i}')

我认为问题出在 NN 和代理类之间的值的形式上,但我无法弄清楚问题所在。

错误回溯:

IndexError                                Traceback (most recent call last)
<ipython-input-10-2e279f658721> in <module>()
     17     score+=reward
     18 
---> 19     agent.learn(obs,action,reward,obs_)
     20     obs=obs_
     21   scores.append(score)

<ipython-input-8-5359b19ec4fa> in learn(self, state, action, reward, state_)
     39     state_s=Variable(self.OH(state_,16)).to(self.Q.device)
     40 
---> 41     q_pred=self.Q.forward(states)[actions]
     42 
     43     q_next=self.Q.forward(state_s).max()

IndexError: index 1 is out of bounds for dimension 0 with size 1
4

1 回答 1

2

由于您正在调用包含矩阵的张量,因此您需要指定在您的情况下调用的索引只需添加[0]到 forward 语句即可解决问题,并且在 中[actions],将其替换为[actions.item()]

self.Q.forward(states)[0][actions.item()] 
于 2021-04-23T20:14:29.137 回答