A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble. forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) This code always worked until I made some preprocessing of data (train_y). The error message says: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example … Read more

How to split data into 3 sets (train, validation and test)?

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). However, I couldn’t find any solution about splitting the data into three sets. Preferably, I’d like to have the indices of the … Read more

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error. ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’). I have run np.isnan(mat.any()) #and gets False np.isfinite(mat.all()) #and gets True I tried using mat[np.isfinite(mat) == True] = … Read more

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets.load_iris() from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() y_pred = gnb.fit(iris.data, iris.target).predict(iris.data) print “Number of mislabeled points : %d” % (iris.target != … Read more