Implement a Transformer Encoder Block. **Example:** Given a sequence of tokens, apply self-attention and feed-forward neural network. **Constraints:** Input sequence length should be less than 512.
Test Cases
Test Case 1
Input:
[[1,2],[3,4]]Expected:
[[1.5,2.5],[3.5,4.5]]Test Case 2
Input:
[[5,6],[7,8]]Expected:
[[5.5,6.5],[7.5,8.5]]+ 3 hidden test cases