Chapter 16: Natural Language processing with RNNs and Attention

4 min readJun 24, 2021

A Review of Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow by Aurélien Géron

Summary

This chapter took what we learned about RNNs in the last chapter and built on it to talk about how we deal with Natural Language Processing specifically. In it Géron talked about basic methods, predictive text, sentiment analysis, and Neural Machine Translation. Unlike many of the previous chapters this chapter focused on one specific topic and drove into the technologies that make it great at the forefront of Machine Learning rather than just providing some of the basics. This just goes to show how far we have come since starting the book 15 chapters ago.

Sentiment Analysis

The IMDb Sentiment analysis problem really is the “Hello World” of Machine Learning as thus far in my learning I have already encountered it three separate times. Essentially, what we need to do in this task is to classify a set of IMDb movie reviews in whether they are positive or negative using the power of NLP. To do this we must preprocess all of our data since it is currently text based and rather noisier than we would hope for. So, first we need to strip out all of the noise like breaks, punctuations, and capitalization. Then, we can actually encode our values by encoding them based on the batches we have created (where each unique word is provided a unique value). Then from here we are able to create our neural net (Géron chose to use GRUs here), train it, and effectively classify our movie reviews into their correct categories.

Encoder-Decoder Networks

Encoder-Decoder networks work exactly how they sound. In the book Géron talks about a language translation ML model that translates sentences from one language to another. In this Google Translate-esque example the input language is encoded into a high vector space which will then eventually be brought down into to a new language through the decoder. This essentially uses the high vector space as an imaginary language that can then be translated directly to the intended language by the encoder.

Transformers

Transformers are the current state of the art when it comes to Natural Language processing but unlike everything else we have looked at so far they don’t contain any recurrent or convolutional layers. Developed in 2017 at Google, Transformers (I agree, it would be cooler if they were autobots) are a type of NMT Architecture that focus on “attention.” What this means is that while our transformer works it detects and keeps in mind the other important values that are included in the sentence, like if I wrote “Samantha was walking her dog” the important words are Samantha, walking, and her dog. So, if we were to funnel this into a transformer model like BERT or GPT-3 it would use these important words when giving its translation recommendations. I am not going to get into the architecture of Transformers as I honestly need to work on my understanding but another resource I found that helped me learn more about this was “What is a Transformer” from IBM.

My Thoughts

This was a really great chapter for me as I have always been interested in how computers interact with language. Up until this point I had only really seen prediction models but one of the things that I really enjoyed in this chapter was learning about the encoder/decoder topic which spoke a lot about how language translation actually works. In a more fun way I always find it cool to learn about the “the machines are taking over stuff” like how in this case the machine is using it’s own imaginary language to translate. I just hear about this stuff in conversation and the media so it is cool to learn a bit more about the technologies and mathematics that underlies all of these cries of a machine uprising. Additionally, I have always heard about transformers whether it was in podcasts, tweets, or just at work so it was cool to learn about a topic that I know is absolutely at the cutting edge of Machine Learning at the current moment. Even more so I found this interesting because at my job we use transformers to help us with some of the products I work on. And while I have always heard cool things about BERT, GPT, and the like it was cool for me to learn more about this as it will be something I can apply in my daily life even if that just is in how I understand what my coworkers are doing.

Thanks for reading!

If you have any questions or feedback please reach out to me on twitter @wtothdev or leave a comment!

Additionally, I wanted to give a huge thanks to Aurélien Géron for writing such an excellent book. You can purchase said book here (non-affiliate).

Disclaimer: I don’t make any money from any of the services referenced and chose to read and review this book under my own free will.