Sequence to Sequence (Seq2Seq)
Implement a basic Sequence-to-Sequence (Seq2Seq) model using an encoder-decoder architecture with RNN cells. Seq2Seq models transform an input sequence of arbitrary length into an output sequence of potentially different length. They are the foundation of machine translation, text summarization, and chatbots.
Architecture:
Encoder: Reads the input sequence and compresses it into a context vector
h_t = tanh(W_enc @ x_t + h_{t-1})
context = h_T (final encoder hidden state)
Decoder: Generates the output sequence from the context vector
h_dec[0] = context
For each output step t:
h_dec[t] = tanh(W_dec @ h_dec[t-1])
y[t] = W_out @ h_dec[t]
Output shape: (output_seq_length, hidden_size)
Example:
Input sequence (3 steps): x = [[1,2], [3,4], [5,6]]
h0 = [0, 0], W_enc = [[0.1,0.1],[0.1,0.1]]
Encoder:
h1 = tanh(W_enc @ [1,2] + [0,0]) = tanh([0.3, 0.3]) = [0.291, 0.291]
h2 = tanh(W_enc @ [3,4] + h1) = tanh([1.0, 1.0]) = [0.762, 0.762]
h3 = tanh(W_enc @ [5,6] + h2) = tanh([1.862, 1.862]) = [0.954, 0.954]
context = [0.954, 0.954]
Decoder (generates 3 output steps):
h_dec[0] = context = [0.954, 0.954]
y[0] = W_out @ h_dec[0]
h_dec[1] = tanh(W_dec @ h_dec[0])
y[1] = W_out @ h_dec[1]
...
**Explanation:** The Seq2Seq model works in two phases. The **encoder** processes the entire input sequence step-by-step, building up a fixed-size context vector that summarizes the input. The **decoder** then uses this context vector to generate the output sequence one step at a time. The context vector acts as a "bottleneck" that forces the encoder to compress all relevant information. This architecture was originally proposed for machine translation (Sutskever et al., 2014) and later improved with attention mechanisms.
Constraints:
Test Cases
x=[[1,2],[3,4],[5,6]], h0=[0,0], W=[[0.1,0.1],[0.1,0.1]]shape (3,2) output sequencex=[[0,0],[0,0]], h0=[0,0], W=anyall zeros