I have a large (about 12M rows) DataFrame df
with say:
df.columns = ['word','documents','frequency']
So the following ran in a timely fashion:
word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']
However, this is taking an unexpectedly long time to run:
Occurrences_of_Words = word_grouping[['word']].count().reset_index()
What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame?
df.word.describe()
ran pretty well, so I really did not expect this Occurrences_of_Words
DataFrame to take very long to build.
P.S.: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well.