Click4Ai

133.

Hard

Sequence to Sequence (Seq2Seq)

Implement a basic Sequence-to-Sequence (Seq2Seq) model using an encoder-decoder architecture with RNN cells. Seq2Seq models transform an input sequence of arbitrary length into an output sequence of potentially different length. They are the foundation of machine translation, text summarization, and chatbots.

Architecture:

Encoder: Reads the input sequence and compresses it into a context vector

h_t = tanh(W_enc @ x_t + h_{t-1})

context = h_T (final encoder hidden state)

Decoder: Generates the output sequence from the context vector

h_dec[0] = context

For each output step t:

h_dec[t] = tanh(W_dec @ h_dec[t-1])

y[t] = W_out @ h_dec[t]

Output shape: (output_seq_length, hidden_size)

Example:

Input sequence (3 steps): x = [[1,2], [3,4], [5,6]]

h0 = [0, 0], W_enc = [[0.1,0.1],[0.1,0.1]]

Encoder:

h1 = tanh(W_enc @ [1,2] + [0,0]) = tanh([0.3, 0.3]) = [0.291, 0.291]

h2 = tanh(W_enc @ [3,4] + h1) = tanh([1.0, 1.0]) = [0.762, 0.762]

h3 = tanh(W_enc @ [5,6] + h2) = tanh([1.862, 1.862]) = [0.954, 0.954]

context = [0.954, 0.954]

Decoder (generates 3 output steps):

h_dec[0] = context = [0.954, 0.954]

y[0] = W_out @ h_dec[0]

h_dec[1] = tanh(W_dec @ h_dec[0])

y[1] = W_out @ h_dec[1]

...

**Explanation:** The Seq2Seq model works in two phases. The **encoder** processes the entire input sequence step-by-step, building up a fixed-size context vector that summarizes the input. The **decoder** then uses this context vector to generate the output sequence one step at a time. The context vector acts as a "bottleneck" that forces the encoder to compress all relevant information. This architecture was originally proposed for machine translation (Sutskever et al., 2014) and later improved with attention mechanisms.

Constraints:

  • Input sequence \`x\` has shape (seq_length, input_size)
  • Initial hidden state \`h0\` has shape (hidden_size,)
  • \`weights\` contains encoder and decoder weight matrices
  • The decoder generates the same number of output steps as input steps
  • Use np.tanh as the activation function
  • Return the output sequence as a 2D numpy array
  • Test Cases

    Test Case 1
    Input: x=[[1,2],[3,4],[5,6]], h0=[0,0], W=[[0.1,0.1],[0.1,0.1]]
    Expected: shape (3,2) output sequence
    Test Case 2
    Input: x=[[0,0],[0,0]], h0=[0,0], W=any
    Expected: all zeros
    + 3 hidden test cases