Deep Q-Network (DQN)
=========================
In this problem, you will implement a Deep Q-Network (DQN), a type of Deep Reinforcement Learning algorithm used to approximate the action-value function (Q-function).
**Example:*
Suppose we have a simple Markov Decision Process (MDP) with two states (A and B) and two actions (left and right). The agent starts at state A and receives a reward of -1 for each step. The goal is to reach state B.
| State | Action | Next State | Reward |
| --- | --- | --- | --- |
| A | left | A | -1 |
| A | right | B | 10 |
| B | left | B | -1 |
| B | right | A | -1 |
The agent starts at state A and chooses the right action, reaching state B. The reward is 10, which is higher than the expected reward of -1. This experience should be used to update the Q-function.
Constraints:
Test Cases
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]QNetwork object[[10, 20, 30], [40, 50, 60], [70, 80, 90]]QNetwork object