The rectified linear unit activation function was the most widely used activation function in Deep Learning architectures.
x: an input data point
i: total number of data points
It rectifies the input data points if it is less than 0, by forcing them to 0.
# ReLU def re_lu(x): result =  for i in x: if i<0: i = 0 result.append(i) return result y = re_lu(x) plot_graph(x, y, 'ReLU')
Used as a default activation function in deep neural networks.
Used in Speech Recognition and Classification problems. Not desirable to use in RNNs for unboundedness in the upper limit.
A fast learning activation function due to its almost linear property. Easy to optimize due to almost linear units.
The unboundedness in the upper limit avoids saturation of gradients while training. The boundedness below induces regularization effects.
Though it is the most used activation function, sometimes ReLU causes the gradients to die leading to dead neurons. Easily overfits the data when compared to sigmoid function.
Even the lower boundedness induces regularization effects and provides one sided saturation, it is constant and doesn’t vary at all.