Click4Ai

422.

Hard

In this problem, you will implement a Dueling Deep Q-Network (DQN) architecture. The Dueling DQN is an extension of the standard DQN, where the Q-value is split into two streams: a value stream and an advantage stream. The value stream estimates the value of the state, while the advantage stream estimates the difference in value between the current state and the next state. The final Q-value is then calculated as the sum of the value stream and the advantage stream.

Example:

Suppose we have a state s and an action a. The value stream estimates the value of the state as V(s), while the advantage stream estimates the difference in value between the current state and the next state as A(s, a). The final Q-value is then calculated as Q(s, a) = V(s) + A(s, a).

Constraints:

You must use the Dueling DQN architecture, which consists of two streams: a value stream and an advantage stream. You must also use the numpy library to perform numerical computations.

Test Cases

Test Case 1
Input: [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
Expected: [[3.0, 3.0], [9.0, 9.0]]
Test Case 2
Input: [[[9, 8], [7, 6]], [[5, 4], [3, 2]]]
Expected: [[13.0, 13.0], [9.0, 9.0]]
+ 3 hidden test cases