pandas GroupBy columns with NaN (missing) values

I have a DataFrame with many missing values in columns which I wish to groupby: import pandas as pd import numpy as np df = pd.DataFrame({‘a’: [‘1’, ‘2’, ‘3’], ‘b’: [‘4’, np.NaN, ‘6’]}) In [4]: df.groupby(‘b’).groups Out[4]: {‘4’: [0], ‘6’: [2]} see that Pandas has dropped the rows with NaN target values. (I want to … Read more

How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?

I have a dataframe with ~300K rows and ~40 columns. I want to find out if any rows contain null values – and put these ‘null’-rows into a separate dataframe so that I could explore them easily. I can create a mask explicitly: mask = False for col in df.columns: mask = mask | df[col].isnull() … Read more

What is the rationale for all comparisons returning false for IEEE754 NaN values?

Why do comparisons of NaN values behave differently from all other values? That is, all comparisons with the operators ==, <=, >=, <, > where one or both values is NaN returns false, contrary to the behaviour of all other values. I suppose this simplifies numerical computations in some way, but I couldn’t find an … Read more