The value function is a fundamental concept in reinforcement learning that estimates the expected return or utility of taking a particular action in a given state. In this problem, you will implement a value function for a given policy and MDP.
**Example:** Suppose we have a robot that can move either up or down in a grid world. The policy is to move up with probability 0.8 and down with probability 0.2. The MDP is the same as in the previous problem.
**Constraints:** The robot can move up or down, but not left or right. The robot starts at position 0 and must reach position 10.
**Goal:** Implement the value function and compute its values for all states.
Test Cases
Test Case 1
Input:
value_function([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function)Expected:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]Test Case 2
Input:
value_function([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function)Expected:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]+ 3 hidden test cases