The Bellman equation is a fundamental concept in reinforcement learning that relates the value function of a state to the value functions of its successor states. In this problem, you will implement the Bellman equation for a given policy and MDP.
**Example:** Suppose we have a robot that can move either up or down in a grid world. The policy is to move up with probability 0.8 and down with probability 0.2. The MDP is the same as in the previous problem.
**Constraints:** The robot can move up or down, but not left or right. The robot starts at position 0 and must reach position 10.
**Goal:** Implement the Bellman equation and compute its values for all states.
Test Cases
Test Case 1
Input:
bellman_equation([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function, mdp_value_function)Expected:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]Test Case 2
Input:
bellman_equation([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function, mdp_value_function)Expected:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]+ 3 hidden test cases