Difference between classification and clustering in data mining? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago. Improve this question Can someone explain what the difference is between classification and clustering in data mining? If … Read more

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I’m trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don’t understand why this is. It’s a multiclass problem, doesn’t that mean that I have to use categorical cross-entropy and that the results with binary cross-entropy are … Read more

How can I one hot encode in Python?

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding? I am trying to do the following for feature selection: I read the train file: num_rows_to_read = … Read more

Which machine learning classifier to choose, in general? [closed]

Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 3 years ago. Improve this question Suppose I’m working on some classification problem. (Fraud detection and comment spam are two problems I’m … Read more

How to split data into 3 sets (train, validation and test)?

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). However, I couldn’t find any solution about splitting the data into three sets. Preferably, I’d like to have the indices of the … Read more

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets.load_iris() from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() y_pred = gnb.fit(iris.data, iris.target).predict(iris.data) print “Number of mislabeled points : %d” % (iris.target != … Read more

Is there a rule-of-thumb for how to divide a dataset into training and validation sets? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it’s on-topic for Stack Overflow. Closed last year. Improve this question Is there a rule-of-thumb for how to best divide data into training and validation sets? Is an even 50/50 split … Read more