21

I am very new to Deep learning and I am particularly interested in knowing what are LSTM and BiLSTM and when to use them (major application areas). Why are LSTM and BILSTM more popular than RNN?

Can we use these deep learning architectures in unsupervised problems?

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
Volka
  • 711
  • 3
  • 6
  • 21
  • 2
    BiLSTM means bidirectional LSTM, which means the signal propagates backward as well as forward in time. You can also apply this architecture to other RNNs. For details please read https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks and http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Welcome to the site! – Emre Dec 14 '17 at 02:27
  • Here is a [post](https://stats.stackexchange.com/a/420172/209206), the difference between RNN and LSTM and here is a [blog](https://medium.com/@raghavaggarwal0089/bi-lstm-bc3d68da8bd0) to demonstrate the difference between LSTM and Bidirectional-LTSM – Benyamin Jafari Dec 27 '19 at 06:24

3 Answers3

15

RNN architectures like LSTM and BiLSTM are used in occasions where the learning problem is sequential, e.g. you have a video and you want to know what is that all about or you want an agent to read a line of document for you which is an image of text and is not in text format. I highly encourage you take a look at here.

LSTMs and their bidirectional variants are popular because they have tried to learn how and when to forget and when not to using gates in their architecture. In previous RNN architectures, vanishing gradients was a big problem and caused those nets not to learn so much.

Using Bidirectional LSTMs, you feed the learning algorithm with the original data once from beginning to the end and once from end to beginning. There are debates here but it usually learns faster than one-directional approach although it depends on the task.

Yes, you can use them in unsupervised learning too depending on your task. take a look at here and here.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
  • 1
    Thanks a lot for the wonderful answer. Can we use lstm for keyword extraction in NLP? – Volka Dec 14 '17 at 12:46
  • actually there are lots of papers about them, e.g. you can see [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjk-a6fyYnYAhVSIlAKHRdzBc4QFggwMAE&url=https%3A%2F%2Fdatascience.stackexchange.com%2Fquestions%2F10077%2Fkeyword-phrase-extraction-from-text-using-deep-learning-libraries&usg=AOvVaw257yIfbjiOkwQHUbmVUsi8) and [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjk-a6fyYnYAhVSIlAKHRdzBc4QFggnMAA&url=https%3A%2F%2Farxiv.org%2Fpdf%2F1502.06922&usg=AOvVaw1mphl-gOd76uH0iCNqqtL3). – Green Falcon Dec 14 '17 at 13:09
  • Thanks a lot. I am just wondering if there are off-the-shelf keyword extraction deep learning approach that we can use? – Volka Dec 15 '17 at 04:22
  • actually I've not seen, maybe its better to ask it :) – Green Falcon Dec 15 '17 at 04:59
4

Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

For further reading this go to Cohen's Blog

Abhishek Sharma
  • 339
  • 3
  • 9
4

In comparison to LSTM, BLSTM or BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction. WIKI

A new class Bidirectional is added as per official doc here:

model = Sequential()
model.add(Bidirectional(LSTM(num_channels, 
        implementation = 2, recurrent_activation = 'sigmoid'),
        input_shape=(input_length, input_dim)))

Complete example using IMDB data will be like this

ebrahimi
  • 1,277
  • 7
  • 20
  • 39
ParthaSen
  • 141
  • 4