Click4Ai

447.

Hard

Dyna-Q Algorithm

**Example:** Consider a robot arm that can move up, down, left, or right. The robot receives a reward of +1 for reaching the goal and -1 for hitting a wall.

**Constraints:** The robot arm is 5x5, the robot starts at the top-left corner, and the goal is at the bottom-right corner. The robot can only move in the four main directions.

Implement the Dyna-Q algorithm to find the optimal policy for this robot arm.

Test Cases

Test Case 1
Input: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]]
Expected: [[[0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7]]]
Test Case 2
Input: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Expected: [[[0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7], [0.1, 0.1, 0.1, 0.7]]]
+ 3 hidden test cases