GRU Cell (Gated Recurrent Unit)
Implement a GRU (Gated Recurrent Unit) cell. GRU is a simplified variant of LSTM that uses only two gates -- an **update gate** and a **reset gate** -- instead of three. Despite having fewer parameters, GRU often achieves comparable performance to LSTM while being faster to train.
Formula:
combined = [x_t ; h_{t-1}] (concatenate input and previous hidden state)
Update gate: z_t = sigmoid(W_z @ combined)
Reset gate: r_t = sigmoid(W_r @ combined)
Candidate: h_hat = tanh(W_h @ [x_t ; r_t * h_{t-1}])
New hidden: h_t = (1 - z_t) * h_{t-1} + z_t * h_hat
Where sigmoid(x) = 1 / (1 + exp(-x))
Example:
input x = [1.0, 0.5], prev_hidden h = [0.0, 0.0]
W_z = [[0.3, 0.2, 0.1, 0.4], [0.1, 0.5, 0.3, 0.2]] (update gate)
W_r = [[0.2, 0.4, 0.3, 0.1], [0.4, 0.1, 0.2, 0.3]] (reset gate)
W_h = [[0.5, 0.1, 0.2, 0.3], [0.2, 0.3, 0.4, 0.1]] (candidate)
Step 1: combined = [1.0, 0.5, 0.0, 0.0]
Step 2: z_t = sigmoid(W_z @ combined) = sigmoid([0.40, 0.35]) = [0.599, 0.587]
Step 3: r_t = sigmoid(W_r @ combined) = sigmoid([0.40, 0.45]) = [0.599, 0.611]
Step 4: reset_hidden = r_t * h = [0, 0]
h_hat = tanh(W_h @ [1.0, 0.5, 0, 0]) = tanh([0.55, 0.35]) = [0.500, 0.337]
Step 5: h_t = (1-z_t) * h + z_t * h_hat = [0.300, 0.198]
**Explanation:** The GRU simplifies the LSTM architecture by combining the forget and input gates into a single **update gate** (z_t). The update gate controls how much of the previous hidden state to keep vs. how much to replace with new information. The **reset gate** (r_t) determines how much of the previous hidden state to consider when computing the candidate. When z=0, the hidden state is copied unchanged; when z=1, it is fully replaced by the candidate. This elegant design makes GRU a popular choice for sequence modeling.
Constraints:
Test Cases
x=[1.0,0.5], h=[0.0,0.0], Wz/Wr/Wh=(2,4) matricesh_new shape (2,)x=[0.0,0.0], h=[0.0,0.0], W=anyh=[0.0, 0.0]