Implement the **Swish** activation function and its derivative.
Swish:
swish(x) = x * sigmoid(beta * x)
Derivative:
swish'(x) = sigmoid(beta*x) + beta * x * sigmoid(beta*x) * (1 - sigmoid(beta*x))
= swish(x) + sigmoid(beta*x) * (beta - beta * swish(x))
Example:
swish(0) → 0.0
swish(1) → 0.7311
swish(-1) → -0.2689
**Explanation:** Swish was discovered by Google Brain using automated search. Unlike ReLU, it's smooth and non-monotonic (slightly negative for small negative inputs). It often outperforms ReLU in deep networks. With beta=1, it's equivalent to SiLU.
Constraints:
Test Cases
Test Case 1
Input:
x=0, beta=1Expected:
swish=0.0Test Case 2
Input:
x=[-2,-1,0,1,2], beta=1Expected:
swish=[-0.2384,-0.2689,0,0.7311,1.7616]Test Case 3
Input:
x=1, beta=1Expected:
swish=0.7311+ 2 hidden test cases