Implement a Transformer Decoder Block. **Example:** Given a sequence of tokens and encoder output, apply self-attention and encoder-decoder attention. **Constraints:** Input sequence length should be less than 512.
Test Cases
Test Case 1
Input:
[[1,2],[3,4]]Expected:
[[1.5,2.5,3.5,4.5]]Test Case 2
Input:
[[5,6],[7,8]]Expected:
[[5.5,6.5,7.5,8.5]]+ 3 hidden test cases