I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.
Thanks!
I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.
Thanks!
Scikit Learn’s train_test_split
is a good one. It will split both numpy arrays and dataframes.
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2)