Click4Ai

404.

Medium

Policy Evaluation

===============

In reinforcement learning, a policy is a mapping from states to actions. Policy evaluation is the process of estimating the value function for a given policy. The value function represents the expected return when following the policy from a given state.

**Example:** We have a simple grid world with four states (A, B, C, D) and two actions (left, right). The policy is to move right from states A and B, and left from states C and D. The reward function is as follows:

| State | Reward |

| --- | --- |

| A | 0 |

| B | 0 |

| C | 10 |

| D | 0 |

The goal is to evaluate the value function for this policy.

**Constraints:** The policy is stationary, and the reward function is deterministic.

Test Cases

Test Case 1
Input: [[0, 1], [1, 0], [2, 3], [3, 2]]
Expected: [0.0, 0.0, 10.0, 0.0]
+ 4 hidden test cases