Implement the **GELU (Gaussian Error Linear Unit)** activation function.
GELU Approximation:
gelu(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))
Example:
gelu(0) → 0.0
gelu(1) → 0.8412
gelu(-1) → -0.1588
**Explanation:** GELU is the activation function used in GPT, BERT, and most modern transformers. It smoothly gates the input: positive values pass through nearly unchanged, while negative values are gradually suppressed (not hard-zeroed like ReLU). This stochastic regularization effect helps in training.
Constraints:
Test Cases
Test Case 1
Input:
x=0Expected:
0.0Test Case 2
Input:
x=1Expected:
0.8412Test Case 3
Input:
x=-1Expected:
-0.1588+ 2 hidden test cases