Softmax is another type of Activation Function, mostly used at output layers of neural network architecture. The main difference between Sigmoid and Softmax activation function is that Sigmoid function is used in binary classification, whereas Softmax is used in multivariate classification.

Range: (0, 1)

```
# Softmax
def softmax(x):
return (np.exp(x)/np.sum(np.exp(x)))
y = softmax(x)
print('Sum of output values from Softmax function: ',np.sum(y))
plot_graph(x, y, 'Softmax')
```

Use cases:

Used in the output layer of a neural network to get the probability values of the predicted classes, with the target class having the highest probability. The predicted probabilities are equal to 1 when summed up together.

It is used in multivariate classification problems.

Used in Game Theory applications and in Reinforcement Learning to ensure a trade-off between Exploitation or Exploration.

Pros:

A generalization of Sigmoid function which was used to represent a probability distribution.

Useful in predicting classes in multivariate classification problems.

Have applications in attention models.

Cons:

As compared to other activation functions(which produce a single output for a single input), the Softmax produces multiple outputs for an input array.

Not desirable to use in hidden layers, because the function is prone to underflow(when numbers near 0 are rounded to 0) and overflow(numbers with large magnitude are approximated to .

Example:

x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Softmax(x) = [7.8013e-05, 2.1206e-04, 5.7644e-04, 1.5669e-03, 4.2593e-03, 1.1578e-02, 3.1472e-02, 8.5552e-02, 2.3255e-01, 6.3214e-01]