Click4Ai

132.

Medium

Bidirectional RNN

Implement a Bidirectional RNN that processes an input sequence in both forward (left-to-right) and backward (right-to-left) directions. By combining information from both directions, a Bidirectional RNN can capture context from both past and future time steps, which is crucial for tasks like named entity recognition and sentiment analysis.

Algorithm:

1. Forward pass: Process sequence from t=0 to t=T-1

h_fwd[t] = tanh(W_fwd @ x[t] + h_fwd[t-1])

2. Backward pass: Process sequence from t=T-1 to t=0

h_bwd[t] = tanh(W_bwd @ x[t] + h_bwd[t+1])

3. Concatenate: output[t] = [h_fwd[t] ; h_bwd[t]]

Output shape: (seq_length, 2 * hidden_size)

-- double the hidden size due to concatenation of both directions

Example:

Input sequence (3 time steps, 2 features):

x = [[1, 2], [3, 4], [5, 6]]

h0 = [0, 0] (initial hidden state)

Forward pass (left to right):

h_fwd[0] = tanh(W @ [1,2] + [0,0])

h_fwd[1] = tanh(W @ [3,4] + h_fwd[0])

h_fwd[2] = tanh(W @ [5,6] + h_fwd[1])

Backward pass (right to left):

h_bwd[2] = tanh(W @ [5,6] + [0,0])

h_bwd[1] = tanh(W @ [3,4] + h_bwd[2])

h_bwd[0] = tanh(W @ [1,2] + h_bwd[1])

Output: [[h_fwd[0]; h_bwd[0]], [h_fwd[1]; h_bwd[1]], [h_fwd[2]; h_bwd[2]]]

**Explanation:** A standard (unidirectional) RNN can only use past context when making predictions at each time step. A Bidirectional RNN runs two independent RNNs -- one processing the sequence forward and another processing it backward -- then concatenates their hidden states at each time step. This provides the model with complete context from both directions, significantly improving performance on tasks where future context matters (e.g., "He said he would ___ tomorrow" requires both left and right context to fill the blank).

Constraints:

  • Input sequence \`x\` has shape (seq_length, input_size) or (seq_length, batch_size, input_size)
  • Initial hidden state \`h0\` has shape (hidden_size,) or (batch_size, hidden_size)
  • \`weights\` is a 2D numpy array shared by both directions (for simplicity)
  • Output is the concatenation of forward and backward hidden states along the feature axis
  • Use np.tanh as the activation function
  • Test Cases

    Test Case 1
    Input: x=[[1,2],[3,4],[5,6]], h0=[0,0], W=[[0.1,0.1],[0.1,0.1]]
    Expected: shape (3, 4) -- 3 timesteps, 2*hidden
    Test Case 2
    Input: x=[[0,0],[0,0]], h0=[0,0], W=any
    Expected: all zeros
    + 3 hidden test cases