It’s been cited by many users as the reason for switching to Pytorch, but I’ve yet to find a justification/explanation for sacrificing the most important practical quality, speed, for eager execution.
Below is code benchmarking performance, TF1 vs. TF2 – with TF1 running anywhere from 47% to 276% faster.
My question is: what is it, at the graph or hardware level, that yields such a significant slowdown?
Looking for a detailed answer – am already familiar with broad concepts. Relevant Git
Specs: CUDA 10.0.130, cuDNN 7.4.2, Python 3.7.4, Windows 10, GTX 1070
Benchmark results:
UPDATE: Disabling Eager Execution per below code does not help. The behavior, however, is inconsistent: sometimes running in graph mode helps considerably, other times it runs slower relative to Eager.
Benchmark code:
# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time
batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)
model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y) # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)
K.clear_session() # in my testing, kernel was restarted instead
model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y) # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)
Functions used:
def timeit(func, iterations, *args):
t0 = time()
for _ in range(iterations):
func(*args)
print("Time/iter: %.4f sec" % ((time() - t0) / iterations))
def make_small_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = Conv1D(128, 400, strides=4, padding='same')(ipt)
x = Flatten()(x)
x = Dropout(0.5)(x)
x = Dense(64, activation='relu')(x)
out = Dense(1, activation='sigmoid')(x)
model = Model(ipt, out)
model.compile(Adam(lr=1e-4), 'binary_crossentropy')
return model
def make_medium_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
x = LSTM(512, activation='relu', return_sequences=True)(x)
x = Conv1D(128, 400, strides=4, padding='same')(x)
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
out = Dense(1, activation='sigmoid')(x)
model = Model(ipt, out)
model.compile(Adam(lr=1e-4), 'binary_crossentropy')
return model
def make_data(batch_shape):
return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))