Multi-Armed Bandit
====================
In reinforcement learning, the multi-armed bandit problem is a classic problem where an agent needs to choose between multiple actions to maximize its reward. The agent receives a reward for each action it takes, but the rewards are not known in advance.
**Example:** Consider a slot machine with three arms. Each arm has a different probability of paying out a reward. The agent needs to choose which arm to pull to maximize its reward.
**Constraints:** The agent should choose the arm with the highest expected reward.
**Your Task:** Implement an epsilon-greedy algorithm to solve the multi-armed bandit problem.
Test Cases
Test Case 1
Input:
[0.1, 0.2, 0.3]Expected:
1Test Case 2
Input:
[0.4, 0.5, 0.6]Expected:
2+ 3 hidden test cases