Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don’t see where I’m supposed to call it. Below is my code attempting to use it: model = Sequential() keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None) model.add(Dense(64, input_dim=14, … Read more

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I’m trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don’t understand why this is. It’s a multiclass problem, doesn’t that mean that I have to use categorical cross-entropy and that the results with binary cross-entropy are … Read more

Ordering of batch normalization and dropout?

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the correct answer for TensorFlow. When using batch normalization and dropout in TensorFlow (specifically using the contrib.layers) do I need to be worried about the ordering? It seems possible that if I … Read more

How to interpret loss and accuracy for a machine learning model [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it’s on-topic for Stack Overflow. Closed last year. Improve this question When I trained my neural network with Theano or Tensorflow, they will report a variable called “loss” per epoch. How … Read more

Extremely small or NaN values appear in training neural network

I’m trying to implement a neural network architecture in Haskell, and use it on MNIST. I’m using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn’t crash. But the problem is, certain combinations of layer size (say, 1000), minibatch size, and learning rate give … Read more

Keras input explanation: input_shape, units, batch_size, dim, etc

For any Keras layer (Layer class), can someone explain how to understand the difference between input_shape, units, dim, etc.? For example the doc says units specify the output shape of a layer. In the image of the neural net below hidden layer1 has 4 units. Does this directly translate to the units attribute of the … Read more

What is the meaning of the word logits in TensorFlow? [duplicate]

This question already has answers here: What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits? (7 answers) Closed 1 year ago. In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don’t understand why it is called logits? Isn’t that a … Read more