Sequence to Sequence Model (Seq2Seq)

 If CNN has been leading the neural network models in Image processing for years, the transformer model has become one of the main highlights of advances in natural language processing (NLP).


Before we get deeper into transformer models, we need to first look into what is "Attention" and Sequence to Sequence Learning in NLP.


What is Sequence to Sequence (Seq2Seq)?

Seq2Seq is a neural network that transforms a given sequence of elements, such as the sequence of words in a sentence, into another sequence. 

It is a special class of Recurrent Neural Network architectures

This is commonly used in NLP and it is particularly good at translation, where the sequence of words from one language is transformed into a sequence of different words in another language. 


A good model that uses the Seq2Seq method is Long-Short-Term-Memory (LSTM)-based model. 

The LSTM modules can give meaning to the sequence by remembering the parts it finds important and forgetting unimportant parts of the sentence or sequence. 

The models, like LSTM, perform well on data that are sequence-dependent, which means that the order of the words is crucial for understanding the sentence.





Seq2Seq models consist of an Encoder and a Decoder parts. 

Firstly, the input sequence goes into the encoder and it gets mapped into a higher dimensional space that contains abstract information about the input sequence. 

It reads the input sequence and summarizes the information in something called the internal state vectors or context vectors.



This abstract vector is fed into the Decoder and output sequence which can be numbers, another language, or even a simple word.

The decoder starts generating the output sequence, and these outputs are also taken into consideration for future outputs.




You may think of Seq2Seq models as a factory with two processes.

The first process (which is the encoder) will take in a car and disassembles it into different parts and pieces with the special manual of the factory.

The second process (which is the decoder), will use those parts and materials to create something new, such as a bus, ship, or even another car.



Drawbacks

1.  Seq2Seq architecture has very limited memory.

 -  The final hidden state of the LSTM, where you’re trying to cram the entirety of the sentence you have to translate, is usually only a few hundred units


2. The deeper a neural network is, the harder it is to train.

 - This means that it is difficult to train long sequences, and it will result vanishing gradient problem.




References:

https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04

https://www.analyticsvidhya.com/blog/2020/08/a-simple-introduction-to-sequence-to-sequence-models/

Comments

Popular Posts