Swish Activation Function

Swish activation function is the combination of Sigmoid activation function and the input data point. This function came from the inspiration of use of sigmoid function for gating in LSTM and highway networks.

 x: an input data point

Reparameterized Swish: 

 

Range:  

# Swish Function
def swish(x):
    return x*(1/(1+np.exp(-x)))

y = swish(x)
plot_graph(x, y, 'Swish') 

Use cases:

Can be used as an alternative to ReLU.

Pros:        

The smoothness, non-monotonicity and the unboundedness avoiding saturation, makes the Swish activation function better than the widely used ReLU function.

Unlike some activation functions which are monotonic(either never decrease or never increase), Swish is non-monotonic, which improves the gradient flow.

The smoothness induced plays a useful role in optimization and generalization by reducing sensitivity to initialization and learning rates.

Cons:

Computation cost is high when compared to ReLU.

Share

Share on twitter
Share on facebook
Share on linkedin
Share on whatsapp
Share on reddit
Share on telegram
Share on pinterest
Share on email
Share on facebook

Readers who read this also read

Close Menu