Click4Ai

401.

Medium

A Markov Decision Process (MDP) is a mathematical framework for modeling decision-making in situations where outcomes are partially random and partially under the control of a decision maker. In this problem, you will implement an MDP with a given transition model and reward function.

**Example:** Suppose we have a robot that can move either up or down in a grid world. The transition model is as follows: if the robot moves up, it will move up with probability 0.8 and down with probability 0.2. If it moves down, it will move down with probability 0.8 and up with probability 0.2. The reward function is as follows: moving up gives a reward of 1, moving down gives a reward of -1, and staying in the same position gives a reward of 0.

**Constraints:** The robot can move up or down, but not left or right. The robot starts at position 0 and must reach position 10.

**Goal:** Implement the MDP and compute the value function using the given transition model and reward function.

Test Cases

Test Case 1
Input: mdp_transition_model([0, 0], 'up')
Expected: [0, 0]
Test Case 2
Input: mdp_reward_function([0, 0], 'up')
Expected: 1
+ 3 hidden test cases