Extremely small or NaN values appear in training neural network
I’m trying to implement a neural network architecture in Haskell, and use it on MNIST. I’m using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn’t crash. But the problem is, certain combinations of layer size (say, 1000), minibatch size, and learning rate give … Read more