pandas – Page 5 – IT Nursery

Getting list of lists into pandas DataFrame

May 29, 2022 by IT Nursery

I am reading contents of a spreadsheet into pandas. DataNitro has a method that returns a rectangular selection of cells as a list of lists. So table = Cell(“A1”).table gives table = [[‘Heading1’, ‘Heading2’], [1 , 2], [3, 4]] headers = table.pop(0) # gives the headers as list and leaves data I am busy writing … Read more

Efficient way to apply multiple filters to pandas DataFrame or Series

May 29, 2022 by IT Nursery

I have a scenario where a user wants to apply several filters to a Pandas DataFrame or Series object. Essentially, I want to efficiently chain a bunch of filtering (comparison operations) together that are specified at run-time by the user. The filters should be additive (aka each one applied should narrow results). I’m currently using … Read more

Convert columns into rows with Pandas

May 29, 2022 by IT Nursery

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like location name Jan-2010 Feb-2010 March-2010 A “test” 12 20 30 B “foo” 18 20 25 What I would like is for it to look like location name … Read more

pandas loc vs. iloc vs. at vs. iat?

May 29, 2022 by IT Nursery

Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas. I’ve read the documentation but I’m struggling to understand the practical implications of the various localization/selection options. Is there a reason why I should ever use .loc or .iloc over at, and … Read more

Find column whose name contains a specific string

May 29, 2022 by IT Nursery

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I’m searching for ‘spike’ in column names like ‘spike-2’, ‘hey spike’, ‘spiked-in’ (the ‘spike’ part is always continuous). I want the column name to be returned as a string or … Read more

pandas: multiple conditions while indexing data frame – unexpected behavior

May 28, 2022 by IT Nursery

I am filtering rows in a dataframe by values in two columns. For some reason the OR operator behaves like I would expect AND operator to behave and vice versa. My test code: import pandas as pd df = pd.DataFrame({‘a’: range(5), ‘b’: range(5) }) # let’s insert some -1 values df[‘a’][1] = -1 df[‘b’][1] = … Read more

How can I one hot encode in Python?

May 28, 2022 by IT Nursery

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding? I am trying to do the following for feature selection: I read the train file: num_rows_to_read = … Read more

Turn Pandas Multi-Index into column

May 28, 2022 by IT Nursery

I have a dataframe with 2 index levels: value Trial measurement 1 0 13 1 3 2 4 2 0 NaN 1 12 3 0 34 Which I want to turn into this: Trial measurement value 1 0 13 1 1 3 1 2 4 2 0 NaN 2 1 12 3 0 34 How … Read more

Multiple aggregations of the same column using pandas GroupBy.agg()

May 28, 2022 by IT Nursery

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df[“returns”], without having to call agg() multiple times? Example dataframe: import pandas as pd import datetime as dt import numpy as np pd.np.random.seed(0) df = pd.DataFrame({ “date” : [dt.date(2012, x, 1) for x in range(1, 11)], “returns” … Read more

python dataframe pandas drop column using int

May 28, 2022 by IT Nursery

I understand that to drop a column you use df.drop(‘column name’, axis=1). Is there a way to drop a column using a numerical index instead of the column name? 11 Answers 11