Click4Ai

131.

Hard

GRU Cell (Gated Recurrent Unit)

Implement a GRU (Gated Recurrent Unit) cell. GRU is a simplified variant of LSTM that uses only two gates -- an **update gate** and a **reset gate** -- instead of three. Despite having fewer parameters, GRU often achieves comparable performance to LSTM while being faster to train.

Formula:

combined = [x_t ; h_{t-1}] (concatenate input and previous hidden state)

Update gate: z_t = sigmoid(W_z @ combined)

Reset gate: r_t = sigmoid(W_r @ combined)

Candidate: h_hat = tanh(W_h @ [x_t ; r_t * h_{t-1}])

New hidden: h_t = (1 - z_t) * h_{t-1} + z_t * h_hat

Where sigmoid(x) = 1 / (1 + exp(-x))

Example:

input x = [1.0, 0.5], prev_hidden h = [0.0, 0.0]

W_z = [[0.3, 0.2, 0.1, 0.4], [0.1, 0.5, 0.3, 0.2]] (update gate)

W_r = [[0.2, 0.4, 0.3, 0.1], [0.4, 0.1, 0.2, 0.3]] (reset gate)

W_h = [[0.5, 0.1, 0.2, 0.3], [0.2, 0.3, 0.4, 0.1]] (candidate)

Step 1: combined = [1.0, 0.5, 0.0, 0.0]

Step 2: z_t = sigmoid(W_z @ combined) = sigmoid([0.40, 0.35]) = [0.599, 0.587]

Step 3: r_t = sigmoid(W_r @ combined) = sigmoid([0.40, 0.45]) = [0.599, 0.611]

Step 4: reset_hidden = r_t * h = [0, 0]

h_hat = tanh(W_h @ [1.0, 0.5, 0, 0]) = tanh([0.55, 0.35]) = [0.500, 0.337]

Step 5: h_t = (1-z_t) * h + z_t * h_hat = [0.300, 0.198]

**Explanation:** The GRU simplifies the LSTM architecture by combining the forget and input gates into a single **update gate** (z_t). The update gate controls how much of the previous hidden state to keep vs. how much to replace with new information. The **reset gate** (r_t) determines how much of the previous hidden state to consider when computing the candidate. When z=0, the hidden state is copied unchanged; when z=1, it is fully replaced by the candidate. This elegant design makes GRU a popular choice for sequence modeling.

Constraints:

  • \`x\` is a 1D numpy array (input at current time step)
  • \`h_prev\` is a 1D numpy array (previous hidden state)
  • \`weights\` is a tuple of three 2D numpy arrays (W_z, W_r, W_h)
  • Each weight matrix has shape (hidden_size, input_size + hidden_size)
  • Return the new hidden state as a 1D numpy array
  • Test Cases

    Test Case 1
    Input: x=[1.0,0.5], h=[0.0,0.0], Wz/Wr/Wh=(2,4) matrices
    Expected: h_new shape (2,)
    Test Case 2
    Input: x=[0.0,0.0], h=[0.0,0.0], W=any
    Expected: h=[0.0, 0.0]
    + 3 hidden test cases