What are the most common pandas ways to select/filter rows of a dataframe whose index is a MultiIndex?
- Slicing based on a single value/label
- Slicing based on multiple labels from one or more levels
- Filtering on boolean conditions and expressions
- Which methods are applicable in what circumstances
Assumptions for simplicity:
- input dataframe does not have duplicate index keys
- input dataframe below only has two levels. (Most solutions shown here generalize to N levels)
Example input:
mux = pd.MultiIndex.from_arrays([ list('aaaabbbbbccddddd'), list('tuvwtuvwtuvwtuvw') ], names=['one', 'two']) df = pd.DataFrame({'col': np.arange(len(mux))}, mux) col one two a t 0 u 1 v 2 w 3 b t 4 u 5 v 6 w 7 t 8 c u 9 v 10 d w 11 t 12 u 13 v 14 w 15
Question 1: Selecting a Single Item
How do I select rows having “a” in level “one”?
col
one two
a t 0
u 1
v 2
w 3
Additionally, how would I be able to drop level “one” in the output?
col
two
t 0
u 1
v 2
w 3
Question 1b
How do I slice all rows with value “t” on level “two”?
col
one two
a t 0
b t 4
t 8
d t 12
Question 2: Selecting Multiple Values in a Level
How can I select rows corresponding to items “b” and “d” in level “one”?
col
one two
b t 4
u 5
v 6
w 7
t 8
d w 11
t 12
u 13
v 14
w 15
Question 2b
How would I get all values corresponding to “t” and “w” in level “two”?
col
one two
a t 0
w 3
b t 4
w 7
t 8
d w 11
t 12
w 15
Question 3: Slicing a Single Cross Section (x, y)
How do I retrieve a cross section, i.e., a single row having a specific values for the index from df
? Specifically, how do I retrieve the cross section of ('c', 'u')
, given by
col
one two
c u 9
Question 4: Slicing Multiple Cross Sections [(a, b), (c, d), ...]
How do I select the two rows corresponding to ('c', 'u')
, and ('a', 'w')
?
col
one two
c u 9
a w 3
Question 5: One Item Sliced per Level
How can I retrieve all rows corresponding to “a” in level “one” or “t” in level “two”?
col
one two
a t 0
u 1
v 2
w 3
b t 4
t 8
d t 12
Question 6: Arbitrary Slicing
How can I slice specific cross sections? For “a” and “b”, I would like to select all rows with sub-levels “u” and “v”, and for “d”, I would like to select rows with sub-level “w”.
col
one two
a u 1
v 2
b u 5
v 6
d w 11
w 15
Question 7 will use a unique setup consisting of a numeric level:
np.random.seed(0) mux2 = pd.MultiIndex.from_arrays([ list('aaaabbbbbccddddd'), np.random.choice(10, size=16) ], names=['one', 'two']) df2 = pd.DataFrame({'col': np.arange(len(mux2))}, mux2) col one two a 5 0 0 1 3 2 3 3 b 7 4 9 5 3 6 5 7 2 8 c 4 9 7 10 d 6 11 8 12 8 13 1 14 6 15
Question 7: Filtering by numeric inequality on individual levels of the multiindex
How do I get all rows where values in level “two” are greater than 5?
col
one two
b 7 4
9 5
c 7 10
d 6 11
8 12
8 13
6 15
Note: This post will not go through how to create MultiIndexes, how to perform assignment operations on them, or any performance related discussions (these are separate topics for another time).