In this problem, you will be implementing the Deep Deterministic Policy Gradient (DDPG) algorithm. The DDPG algorithm is a type of model-free reinforcement learning algorithm that uses a neural network to learn the policy and a neural network to learn the critic. The goal is to implement the DDPG algorithm to learn a policy that maximizes the cumulative reward. **Example:** If the environment is a simple grid world, the policy could be to move up, down, left, or right with a probability of 0.25 each. **Constraints:** The algorithm should use a neural network to learn the policy and a neural network to learn the critic. The policy should be a deterministic policy.
Test Cases
Test Case 1
Input:
[[1, 2], 3]Expected:
[0.5, 0.5]Test Case 2
Input:
[[4, 5], 6]Expected:
[0.8, 0.8]+ 3 hidden test cases