I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.

Thanks!

26 Answers
26

Scikit Learn’s train_test_split is a good one. It will split both numpy arrays and dataframes.

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2)

Leave a Reply

Your email address will not be published. Required fields are marked *