Chapter 18: Reinforcement Learning

3 min readJun 27, 2021

A Review of Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow by Aurélien Géron

This hilarious video shows an example of Reinforcement Learning Google’s DeepMind teaches a human to walk - Tech Insider

Summary

When I traditionally thought of Machine Learning and Artificial Learning before beginning on my learning journey Reinforcement Learning was essentially what I thought of. It is the process in which a computer, or computer powered robot, learns how to perform a task with no human intervention. Most people have seen the video that I have linked to above as a human-like robot learns to walk. It is honestly a pretty funny to watch as it stumbles and attempts to walk in a way that is anything but humanlike. However, this robot is simply learning everything that does not work through a system of trial and error. Essentially, it is running itself through a feedback loop where it attempts to walk and is rewarded based on how far it makes it. Thus, the good behavior is reinforced and the bad behavior is punished.

Basic Reinforcement Learning

Reinforcement Learning is all about building a model that can interpret and learn from rewards and punishments. So, for example if I design a car simulator and teach it to drive I set up an environment where the car can test out driving. How it drives in this case will be called its policy. This policy then learns and is “reinforced” by the rewards and punishments that are set up for when the car runs into a barrier or makes it 3/4 of the way through the course.

OpenAI Gym

So, in the last paragraph I mentioned an environment. There are many ways to create an environment but rather than building your own you can use a library like the OpenAI gym. This “Gym” imported through

pip install gym #in the command lineimport gym
env = gym.make(<your gym>)

You can gain access to all sorts of environments like Atari games or the cartpole that we worked on in this chapter. And within this gym you can experiment and optimize your Reinforcement Learning Models without spending a great deal of time building your own realistic environment.

Q-Learning

Q-Learning is a way of implementing Reinforcement Learning by way of Q-Values. These Q-values are way of evaluating the rewards that our agent works its way through in the environment. I’m not going to go into the math of this method but it is essentially an evaluation method that optimizes the models policy to maximize the target Q-value. Building on top of the normal Q-Learning Models you can use Deep Learning in the Neural Network portion to create a Deep Q-Learning Model. And while it may be difficult to train effectively it can create strong Reinforcement Models.

My Thoughts

This was a cool chapter for me because as I stated above Reinforcement Learning is essentially what I thought of as Machine Learning before I entered the field. It was how machines essentially entered a completely unknown field and learned from their surroundings much like we do when we are children. When you watch a children learn anything they learn by both observation and just testing things out. When learning to ride a bike a child typically learns through trial and error. If they fail they are punished by falling down and when they get further and further they are praised by their parents (or their own intrinsic satisfaction). This is particularly cool to me because to me this seems a lot more like Artificial General Intelligence to me than Deep Learning with labelled dataset does. In particular, like mentioned in the chapter, AlphaZero is a really cool example of this. It is Reinforcement Learning model that has a somewhat generalized intelligence as it has been show to learn and master most Atari Games despite little-to-no human input. As a form of Generalized Intelligence I find this to be really cool and am excited to see more of where the field goes in the future.

Thanks for reading!

If you have any questions or feedback please reach out to me on twitter @wtothdev or leave a comment!

Additionally, I wanted to give a huge thanks to Aurélien Géron for writing such an excellent book. You can purchase said book here (non-affiliate).

Disclaimer: I don’t make any money from any of the services referenced and chose to read and review this book under my own free will.