Swish activation function is the combination of Sigmoid activation function and the input data point. This function came from the inspiration of use of sigmoid function for gating in LSTM and highway networks.
x: an input data point
# Swish Function def swish(x): return x*(1/(1+np.exp(-x))) y = swish(x) plot_graph(x, y, 'Swish')
Can be used as an alternative to ReLU.
The smoothness, non-monotonicity and the unboundedness avoiding saturation, makes the Swish activation function better than the widely used ReLU function.
Unlike some activation functions which are monotonic(either never decrease or never increase), Swish is non-monotonic, which improves the gradient flow.
The smoothness induced plays a useful role in optimization and generalization by reducing sensitivity to initialization and learning rates.
Computation cost is high when compared to ReLU.