A Review of Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow by Aurélien Géron
Summary
Dimensionality Reduction is the act of taking high-dimensional data (data that requires more than 3 dimensions) and reducing the number of dimensions to either make the data easier to work with or to make it reasonably visible in data visualizations. This reduction is mathematics heavy and requires a loss of information but in the long-run can make data workable when it wasn’t in the past.
Projection
Projection is exactly what is sounds like, projecting a multidimensional object onto a lower dimensional space. Think of it like looking at something in real life like a water bottle and then looking at the water bottles shadow. The shadow (though a bit elongated) is a projection of the 3D object onto a 2D space. This is similar to projection. Projection is able to work like this because often times training instances are not uniformly distributed across dimensions but lie within or close to a lower dimensional subspace. However, projection is not always a viable option due to the complexities of high-dimensional data. To see this more clearly we can look at the 2d projection of the swissroll in image above and notice that there are areas of serious overlap even though this is a relatively simple object with few datapoints.
PCA
Principal Component Analysis is the most popular dimensionality reduction algorithm and is based on projecting data onto the hyperplane that lies closest to the data. This is done through selecting the Principal Components, the axis that preserves the most amount of variance from the training set. Then, it takes the orthogonal axis as the second principal component, repeating this process until we have achieved D dimensions. So, then once we have selected the D number of best axes we can now reduce the number of dimensions and still get a highly descriptive dataset that has now much simpler than before.
My Thoughts
I am new to Dimensionality Reduction and honestly had a bit of trouble with this chapter, however, since I know this now I plan on attempting to learn more about it in the future! Additionally, I hope that in the later chapters of this book there will be more examples of Dimensionality Reduction in practice so I can continue to learn more about the subject and see it in action. Dimensionality Reduction is the first form of unsupervised learning I have ever really encountered and with it being so complex I’m interested to see what the other forms of unsupervised learning will be like.
Thanks for reading!
If you have any questions or feedback please reach out to me on twitter @wtothdev or leave a comment!
Additionally, I wanted to give a huge thanks to Aurélien Géron for writing such an excellent book. You can purchase said book here (non-affiliate).
Disclaimer: I don’t make money from any of the services referenced and chose to both read and review this book under my own volition.