Determine the data types of a data frame’s columns
I’m using R and have loaded data into a dataframe using read.csv(). How do I determine the data type of each column in the data frame? 11 Answers 11
I’m using R and have loaded data into a dataframe using read.csv(). How do I determine the data type of each column in the data frame? 11 Answers 11
I have a pandas dataframe with mixed type columns, and I’d like to apply sklearn’s min_max_scaler to some of the columns. Ideally, I’d like to do these transformations in place, but haven’t figured out a way to do that yet. I’ve written the following code that works: import pandas as pd import numpy as np … Read more
I’m reading in a csv file with multiple datetime columns. I’d need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance: headers = [‘col1’, ‘col2’, ‘col3’, ‘col4’] dtypes = [‘datetime’, ‘datetime’, ‘str’, ‘float’] pd.read_csv(file, sep=’\t’, header=None, names=headers, dtype=dtypes) When run gives a error: TypeError: data … Read more
I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames? In other word, a data frame that has all the rows/columns in df1 that are not in df2? 15 Answers 15
I have a pandas dataframe. I want to print the unique values of one of its columns in ascending order. This is how I am doing it: import pandas as pd df = pd.DataFrame({‘A’:[1,1,3,2,6,2,8]}) a = df[‘A’].unique() print a.sort() The problem is that I am getting a None for the output. 8 Answers 8
Today I was positively surprised by the fact that while reading data from a data file (for example) pandas is able to recognize types of values: df = pandas.read_csv(‘test.dat’, delimiter=r”\s+”, names=[‘col1′,’col2′,’col3’]) For example it can be checked in this way: for i, r in df.iterrows(): print type(r[‘col1’]), type(r[‘col2’]), type(r[‘col3’]) In particular integer, floats and strings … Read more
I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I’m searching for ‘spike’ in column names like ‘spike-2’, ‘hey spike’, ‘spiked-in’ (the ‘spike’ part is always continuous). I want the column name to be returned as a string or … Read more
I have a dataframe with 2 index levels: value Trial measurement 1 0 13 1 3 2 4 2 0 NaN 1 12 3 0 34 Which I want to turn into this: Trial measurement value 1 0 13 1 1 3 1 2 4 2 0 NaN 2 1 12 3 0 34 How … Read more
Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df[“returns”], without having to call agg() multiple times? Example dataframe: import pandas as pd import datetime as dt import numpy as np pd.np.random.seed(0) df = pd.DataFrame({ “date” : [dt.date(2012, x, 1) for x in range(1, 11)], “returns” … Read more
I understand that to drop a column you use df.drop(‘column name’, axis=1). Is there a way to drop a column using a numerical index instead of the column name? 11 Answers 11