Eligibility Traces
Example:
In a simple grid world, an agent can move up, down, left, or right. The agent receives a reward of +1 for reaching the goal state. We want to learn the value function using eligibility traces.
Constraints:
alpha = 0.1, gamma = 0.9, lambda = 0.5, epsilon = 0.01
Implement eligibility traces to learn the value function.
Test Cases
Test Case 1
Input:
[[0.5, 0.3], [0.2, 0.1]]Expected:
[[0.51, 0.31], [0.21, 0.11]]Test Case 2
Input:
[[0.1, 0.2], [0.3, 0.4]]Expected:
[[0.11, 0.21], [0.31, 0.41]]+ 3 hidden test cases