ImportError: No module named sklearn.cross_validation

I am using python 2.7 in Ubuntu 14.04. I installed scikit-learn, numpy and matplotlib with these commands: sudo apt-get install build-essential python-dev python-numpy \ python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib \ ipython But when I import these packages: from sklearn.cross_validation import train_test_split It returns me this error: ImportError: No module named sklearn.cross_validation What I need to … Read more

How to normalize a NumPy array to a unit vector?

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if norm == 0: return v return v / norm This function handles the situation where vector v has the norm value of 0. Is … Read more

Label encoding across multiple columns in scikit-learn

I’m trying to use scikit-learn’s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I’d rather just have one big LabelEncoder objects that works across all my columns of data. Throwing the entire DataFrame into LabelEncoder creates … Read more