Click4Ai

402.

Medium

The value function is a fundamental concept in reinforcement learning that estimates the expected return or utility of taking a particular action in a given state. In this problem, you will implement a value function for a given policy and MDP.

**Example:** Suppose we have a robot that can move either up or down in a grid world. The policy is to move up with probability 0.8 and down with probability 0.2. The MDP is the same as in the previous problem.

**Constraints:** The robot can move up or down, but not left or right. The robot starts at position 0 and must reach position 10.

**Goal:** Implement the value function and compute its values for all states.

Test Cases

Test Case 1
Input: value_function([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function)
Expected: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Test Case 2
Input: value_function([0.8, 0.2, 0], mdp_transition_model, mdp_reward_function)
Expected: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
+ 3 hidden test cases