How to access pandas groupby dataframe by key

How do I access the corresponding groupby dataframe in a groupby object by the key? With the following groupby: rand = np.random.RandomState(1) df = pd.DataFrame({‘A’: [‘foo’, ‘bar’] * 3, ‘B’: rand.randn(6), ‘C’: rand.randint(0, 20, 6)}) gb = df.groupby([‘A’]) I can iterate through it to get the keys and groups: In [11]: for k, gp in … Read more

Multiple aggregations of the same column using pandas GroupBy.agg()

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df[“returns”], without having to call agg() multiple times? Example dataframe: import pandas as pd import datetime as dt import numpy as np pd.np.random.seed(0) df = pd.DataFrame({ “date” : [dt.date(2012, x, 1) for x in range(1, 11)], “returns” … Read more

pandas GroupBy columns with NaN (missing) values

I have a DataFrame with many missing values in columns which I wish to groupby: import pandas as pd import numpy as np df = pd.DataFrame({‘a’: [‘1’, ‘2’, ‘3’], ‘b’: [‘4’, np.NaN, ‘6’]}) In [4]: df.groupby(‘b’).groups Out[4]: {‘4’: [0], ‘6’: [2]} see that Pandas has dropped the rows with NaN target values. (I want to … Read more

How to loop over grouped Pandas dataframe?

DataFrame: c_os_family_ss c_os_major_is l_customer_id_i 0 Windows 7 90418 1 Windows 7 90418 2 Windows 7 90418 Code: print df for name, group in df.groupby(‘l_customer_id_i’).agg(lambda x: ‘,’.join(x)): print name print group I’m trying to just loop over the aggregated data, but I get the error: ValueError: too many values to unpack @EdChum, here’s the expected output: … Read more

Count unique values per groups with Pandas [duplicate]

This question already has answers here: Pandas ‘count(distinct)’ equivalent (10 answers) Closed 3 years ago. I need to count unique ID values in every domain. I have data: ID, domain 123, ‘vk.com’ 123, ‘vk.com’ 123, ‘twitter.com’ 456, ‘vk.com’ 456, ‘facebook.com’ 456, ‘vk.com’ 456, ‘google.com’ 789, ‘twitter.com’ 789, ‘vk.com’ I try df.groupby([‘domain’, ‘ID’]).count() But I want … Read more

Converting a Pandas GroupBy output from Series to DataFrame

I’m starting with input data like this df1 = pandas.DataFrame( { “Name” : [“Alice”, “Bob”, “Mallory”, “Mallory”, “Bob” , “Mallory”] , “City” : [“Seattle”, “Seattle”, “Portland”, “Seattle”, “Seattle”, “Portland”] } ) Which when printed appears as this: City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 … Read more

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a data frame df and I use several columns from it to groupby: df[‘col1′,’col2′,’col3′,’col4’].groupby([‘col1′,’col2’]).mean() In the above way I almost get the table (data frame) that I need. What is missing is an additional column that contains number of rows in each group. In other words, I have mean but I also would … Read more