Why use softmax as opposed to standard normalization?

IT Nursery

May 30, 2022

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

enter image description here

This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalise just by dividing all outputs by the sum of all outputs?

10 Answers
10

Tags: math neural-network softmax

10 Answers 10

Leave a Reply Cancel reply

10 Answers
10