Click4Ai

446.

Hard

Model-Based RL

**Example:** Consider a grid world where an agent can move up, down, left, or right. The agent receives a reward of +1 for reaching the goal and -1 for hitting a wall.

**Constraints:** The grid world is 5x5, the agent starts at the top-left corner, and the goal is at the bottom-right corner. The agent can only move in the four main directions.

Implement a model-based RL algorithm to find the optimal policy for this grid world.

Test Cases

Test Case 1
Input: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]]
Expected: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Test Case 2
Input: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Expected: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
+ 3 hidden test cases