Apply vs transform on a group object

Consider the following dataframe:

columns = ['A', 'B', 'C', 'D']
records = [
    ['foo', 'one', 0.162003, 0.087469],
    ['bar', 'one', -1.156319, -1.5262719999999999],
    ['foo', 'two', 0.833892, -1.666304],     
    ['bar', 'three', -2.026673, -0.32205700000000004],
    ['foo', 'two', 0.41145200000000004, -0.9543709999999999],
    ['bar', 'two', 0.765878, -0.095968],
    ['foo', 'one', -0.65489, 0.678091],
    ['foo', 'three', -1.789842, -1.130922]
]
df = pd.DataFrame.from_records(records, columns=columns)

"""
     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922
"""

The following commands work:

df.groupby('A').apply(lambda x: (x['C'] - x['D']))
df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

but none of the following work:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
# KeyError or ValueError: could not broadcast input array from shape (5) into shape (5,3)

df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
# KeyError or TypeError: cannot concatenate a non-NDFrame object

Why? The example on the documentation seems to suggest that calling transform on a group allows one to do row-wise operation processing:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

In other words, I thought that transform is essentially a specific type of apply (the one that does not aggregate). Where am I wrong?

For reference, below is the construction of the original dataframe above:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

5 Answers
5

Leave a Comment