Regularizing and Optimizing LSTM
Regularizing and Optimizing LSTM
LSTM Long Short Term Memory
act as a small unit to understand the current state of input words, LSTM stacks up and undertands the full sentence. You just need large data to train your model.
weight-dropped LSTM
applies recurrent regularization through a DropConnect mask on the hidden-to-hidden recurrent weights
other modifications like dropout in hidden state vechtor h_t-1 or in updating to the memory state c_t, these modifications to standard LSTM prevent treating it as a black box, might make it slow. To address this problem, use of DropConnect on the recurrent hidden to hidden weight matrices is proposed.
randomized-length backpropagation through time(BPTT)
embedding dropout
activation regularization(AR)
temporal activation regularization(TAR)
averaged SGD (ASGD)
NT-ASGD (none-monotomically triggered variant of ASGD), opt better for LSTM
all others are for regulation.