Click4Ai

433.

Medium

Multi-Armed Bandit

====================

In reinforcement learning, the multi-armed bandit problem is a classic problem where an agent needs to choose between multiple actions to maximize its reward. The agent receives a reward for each action it takes, but the rewards are not known in advance.

**Example:** Consider a slot machine with three arms. Each arm has a different probability of paying out a reward. The agent needs to choose which arm to pull to maximize its reward.

**Constraints:** The agent should choose the arm with the highest expected reward.

**Your Task:** Implement an epsilon-greedy algorithm to solve the multi-armed bandit problem.

Test Cases

Test Case 1
Input: [0.1, 0.2, 0.3]
Expected: 1
Test Case 2
Input: [0.4, 0.5, 0.6]
Expected: 2
+ 3 hidden test cases