Click4Ai

426.

Medium

Advantage Function

=====================

In this problem, you will implement the advantage function, a key concept in reinforcement learning that measures the difference between the expected return and the actual return.

**Example:** Consider a simple grid world where an agent can move up, down, left, or right. The agent receives a reward of +1 for reaching the goal and -1 for hitting a wall.

**Constraints:** Use NumPy for efficient computation and implement the advantage function with a simple neural network policy and a value function.

**Note:** This problem requires a good understanding of reinforcement learning concepts and NumPy.

Test Cases

Test Case 1
Input: {"num_states": 4, "num_actions": 4, "discount_factor": 0.9}
Expected: None
Test Case 2
Input: {"num_states": 10, "num_actions": 5, "discount_factor": 0.8}
Expected: None
+ 3 hidden test cases