Model-Based RL
**Example:** Consider a grid world where an agent can move up, down, left, or right. The agent receives a reward of +1 for reaching the goal and -1 for hitting a wall.
**Constraints:** The grid world is 5x5, the agent starts at the top-left corner, and the goal is at the bottom-right corner. The agent can only move in the four main directions.
Implement a model-based RL algorithm to find the optimal policy for this grid world.
Test Cases
Test Case 1
Input:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]]Expected:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]Test Case 2
Input:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]Expected:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]+ 3 hidden test cases