Which machine learning classifier to choose, in general? [closed]

IT Nursery

May 27, 2022

Suppose I’m working on some classification problem. (Fraud detection and comment spam are two problems I’m working on right now, but I’m curious about any classification task in general.)

How do I know which classifier I should use?

Decision tree
SVM
Bayesian
Neural network
K-nearest neighbors
Q-learning
Genetic algorithm
Markov decision processes
Convolutional neural networks
Linear regression or logistic regression
Boosting, bagging, ensambling
Random hill climbing or simulated annealing
…

In which cases is one of these the “natural” first choice, and what are the principles for choosing that one?

Examples of the type of answers I’m looking for (from Manning et al.’s Introduction to Information Retrieval book):

a. If your data is labeled, but you only have a limited amount, you should use a classifier with high bias (for example, Naive Bayes).

I’m guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.

b. If you have a ton of data, then the classifier doesn’t really matter so much, so you should probably just choose a classifier with good scalability.

What are other guidelines? Even answers like “if you’ll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent” are good. I care less about implementation/library issues, though.
Also, for a somewhat separate question, besides standard Bayesian classifiers, are there ‘standard state-of-the-art’ methods for comment spam detection (as opposed to email spam)?

9 Answers
9

Tags: machine-learning

9 Answers 9

Leave a Reply Cancel reply

9 Answers
9