Advantage Function
=====================
In this problem, you will implement the advantage function, a key concept in reinforcement learning that measures the difference between the expected return and the actual return.
**Example:** Consider a simple grid world where an agent can move up, down, left, or right. The agent receives a reward of +1 for reaching the goal and -1 for hitting a wall.
**Constraints:** Use NumPy for efficient computation and implement the advantage function with a simple neural network policy and a value function.
**Note:** This problem requires a good understanding of reinforcement learning concepts and NumPy.
Test Cases
Test Case 1
Input:
{"num_states": 4, "num_actions": 4, "discount_factor": 0.9}Expected:
NoneTest Case 2
Input:
{"num_states": 10, "num_actions": 5, "discount_factor": 0.8}Expected:
None+ 3 hidden test cases