Here is the implementation of LSTM in Theano and tested on IMDB dataset

Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves is a great book. It describes LSTM’s and RNN’s

LSTM’s were introduced by Sepp Hochreiter and Jurgen Schmidhuber in their paper. They describe six experiments where LSTM’s are able to perform better than RNN’s compared on various basis. LSTM’s do not have the vanishing gradient problem. They work even when there are long delays, and can handle signals that have a mix of low and high frequency components

Word2Vec uses (LSTMs) Long Short Term Memory networks [1]. They come within the category of Recurrent Neural Networks(RNN), and are capable of learning long-term dependencies. They were introduced by Hochreiter (1991)[1] and Bengio, et al. (1994)[2]. LSTMs are explicitly designed to avoid the long-term dependency problem, so their major advantage is that they are able to remember information for long periods of time. There is a lot of variation on the type of LSTM that can be used. Some types include, Depth Gated RNNs by Yao, et al. (2015)[3], and Koutnik, et al (2014) [4].

Greff, et al. (2015)[5] reports work on comparison on some variants and reach the conclusion that the basic methodology followed is similar. Work by Jozefowicz, et al. (2015)[6] includes testing more than ten thousand RNN architectures, and report that some RNNs are better suited to some tasks, because of which, they give better results than LSTMs on certain tasks. Hence, the choice of which LSTM to use is largely based on the task at hand. Mikolov[7] recommends using the skip-gram model with negative sampling (SGNS), as it outperformed the other variants on analogy tasks. In [8] the authors discuss the usage of unsupervised vector approaches to model rich semantic meanings. They use Word2Vec as to model the semantic meanings and this paper serves as the main motivation for this research.


[1] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735 - 1780, November 1997.

[2] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult. 1994.

[3] Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris Dyer. Depth-gated LSTM. CoRR, abs/1508.03790, 2015.

[4] Jan Koutnk, Klaus Greff, Faustino J. Gomez, and Jurgen Schmidhuber. A clockwork RNN. CoRR, abs/1402.3511, 2014.

[5] Klaus Greff, Rupesh Kumar Srivastava, Jan Koutnk, Bas R. Steunebrink, and Jurgen Schmidhuber. LSTM: A search space odyssey. CoRR, abs/1503.04069, 2015.

[6] Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent network architectures. pages 2342{2350, 2015.

[7] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. pages 3111 - 3119, 2013.

blog comments powered by Disqus


03 October 2015