Extremely small or NaN values appear in training neural network

I’m trying to implement a neural network architecture in Haskell, and use it on MNIST. I’m using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn’t crash. But the problem is, certain combinations of layer size (say, 1000), minibatch size, and learning rate give … Read more

What is the role of the bias in neural networks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it’s on-topic for Stack Overflow. Closed 1 year ago. Improve this question I’m aware of the gradient descent and the back-propagation algorithm. What I don’t get is: when is using a … Read more