Replacing column values in a pandas DataFrame

I’m trying to replace the values in one column of a dataframe. The column (‘female’) only contains the values ‘female’ and ‘male’. I have tried the following: w[‘female’][‘female’]=’1′ w[‘female’][‘male’]=’0′ But receive the exact same copy of the previous results. I would ideally like to get some output which resembles the following loop element-wise. if w[‘female’] … Read more

add a string prefix to each value in a string column using Pandas

I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly). I already figured out how to kind-of do this and I am currently using: df.ix[(df[‘col’] != False), ‘col’] = ‘str’+df[(df[‘col’] != False), ‘col’] This seems one hell of an inelegant thing to do … Read more

How to test if a string contains one of the substrings in a list, in pandas?

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()? For example, say I have the series s = pd.Series([‘cat’,’hat’,’dog’,’fog’,’pet’]), and I want to find all places where s contains any of [‘og’, ‘at’], I would want to get everything but ‘pet’. I have a solution, but it’s rather … Read more

What are the differences between Pandas and NumPy+SciPy in Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 6 years ago. Improve this question They both seem exceedingly similar and I’m curious as to which package would be more beneficial … Read more

Counting unique values in a column in pandas dataframe like in Qlik?

If I have a table like this: df = pd.DataFrame({ ‘hID’: [101, 102, 103, 101, 102, 104, 105, 101], ‘dID’: [10, 11, 12, 10, 11, 10, 12, 10], ‘uID’: [‘James’, ‘Henry’, ‘Abe’, ‘James’, ‘Henry’, ‘Brian’, ‘Claude’, ‘James’], ‘mID’: [‘A’, ‘B’, ‘A’, ‘B’, ‘A’, ‘A’, ‘A’, ‘C’] }) I can do count(distinct hID) in Qlik to … Read more

A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble. forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) This code always worked until I made some preprocessing of data (train_y). The error message says: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example … Read more

NumPy or Pandas: Keeping array type as integer while having a NaN value

Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.NaN? In particular, I am converting an in-house data structure to a Pandas DataFrame. In our structure, we have integer-type columns that still have NaN’s (but … Read more