python - 为什么在循环内部或外部初始化变量会改变代码行为？

Question

作为我学习的一部分，我正在为 gridworld 环境在 python 中实现策略迭代。我写了以下代码：

### POLICY ITERATION ###
def policy_iter(grid, policy):
    '''
        Perform policy iteration to find the best policy and its value
    '''
    i = 1   
    while True:
        policy_converged = True # flag to check if the policy imporved and break out of the loop
        # evaluate the value function for the older policy
        old_v = value_eval(grid, policy)

        # evaluate the new policy
        for s in states:
            new_a = ""
            best_v = float("-inf")
            if grid.is_terminal(s):
                continue
            old_a = policy[s]
            for a in ACTION_SPACE:
                v = 0
                for s2 in states:
                    env_prob = transition_probs.get((s,a,s2), 0)
                    reward = rewards.get((s,a,s2), 0)

                    v += env_prob * (reward + gamma*old_v[s2])
                if v > best_v:
                    new_a = a
                    best_v = v
            policy[s] = new_a
            if new_a != old_a:
                policy_converged = False
        print(i, "th iteration")
        i += 1
        if policy_converged == True:
            break

    return policy

这段代码工作正常。但是，当我只是更改要在 for 循环之外声明的 '''policy_converged''' 变量的位置时，

def policy_iter(grid, policy):
'''
    Perform policy iteration to find the best policy and its value
'''
i = 1  
policy_converged = True
while True:

其余代码保持不变。在这种情况下，程序开始进入无限循环并且永远不会停止，即使我根据主 while 循环内每次迭代后的性能更改标志的值。为什么会这样？

score 0 · Accepted Answer

if policy_converged == True: break如果policy_converged为真，则循环仅退出（通过）。但是，如果您将唯一将此变量设置为 True 的行移到循环之前，那么如果在第一次迭代中该变量为 False，则无法将其设置为 True，因此无法退出循环。

您应该重新考虑您的循环终止逻辑，并确保有一种方法policy_converged可以在循环内设置为 True。

score 0 · Accepted Answer

在您第一次通过循环时，policy_converged设置为False. 在那之后，什么都不会将它设置为True，所以break永远不会到达，它会永远循环。

python - 为什么在循环内部或外部初始化变量会改变代码行为？

2 回答 2

Related

Reference