Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I’m trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy.

I don’t understand why this is. It’s a multiclass problem, doesn’t that mean that I have to use categorical cross-entropy and that the results with binary cross-entropy are meaningless?

model.add(embedding_layer)
model.add(Dropout(0.25))
# convolution layers
model.add(Conv1D(nb_filter=32,
                    filter_length=4,
                    border_mode="valid",
                    activation='relu'))
model.add(MaxPooling1D(pool_length=2))
# dense layers
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.25))
model.add(Activation('relu'))
# output layer
model.add(Dense(len(class_id_index)))
model.add(Activation('softmax'))

Then I compile it either it like this using categorical_crossentropy as the loss function:

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

or

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])

Intuitively it makes sense why I’d want to use categorical cross-entropy, I don’t understand why I get good results with binary, and poor results with categorical.

12 Answers
12

Leave a Comment