Click4Ai

417.

Easy

In this problem, we will implement the Softmax exploration strategy for reinforcement learning. The Softmax algorithm chooses the action with the highest probability based on the estimated rewards.

**Example:** Consider a simple grid world where an agent can move up, down, left, or right. The agent receives a reward of 1 for reaching a goal state and -1 for hitting a wall.

**Constraints:** Implement the Softmax algorithm with a temperature parameter that controls the exploration-exploitation trade-off.

import numpy as np

def softmax_exploration(rewards, temperature):

# Your code here

pass

Test Cases

Test Case 1
Input: [[1, 2], [3, 4]]
Expected: 0
Test Case 2
Input: [[10, 20], [30, 40]]
Expected: 0
+ 3 hidden test cases