What is Deep Learning?

Deep Learning is a specific use case of Machine Learning when we use multi-layered Neural Networks as a model. Machine learning is a discipline in which we define a program not by writing it entirely ourselves, but by learning from data. Deep learning is a specialty within machine learning that uses neural networks with multiple layers. Image classification is a representative example (also known as image recognition). We start with labeled data —a set of images for which we have assigned a label to each image, indicating what it represents. Our goal is to produce a program, called a model, that, given a new image, will make an accurate prediction regarding what that new image represents. Every model starts with a choice of architecture, a general template for how that kind of model works internally. The process of training (or fitting) the model is the process of finding a set of parameter values (or weights) that specialize that general architecture into a model that works well for our particular kind of data. To define how well a model does on a single prediction, we need to define a loss function, which determines how we score a prediction as good or bad. To make the training process go faster, we might start with a pretrained model—a model that has already been trained on someone else’s data. We can then adapt it to our data by training it a bit more on our data, a process called fine-tuning. When we train a model, a key concern is to ensure that our model generalizes: it learns general lessons from our data that also apply to new items it will encounter, so it can make good predictions on those items. The risk is that if we train our model badly, instead of learning general lessons, it effectively memorizes what it has already seen, and then it will make poor predictions about new images. Such a failure is called overfitting. To avoid this, we always divide our data into two parts, the training set and the validation set. We train the model by showing it only the training set, and then we evaluate how well the model is doing by seeing how well it performs on items from the validation set. In this way, we check if the lessons the model learns from the training set are lessons that generalize to the validation set. In order for a person to assess how well the model is doing on the validation set overall, we define a metric. During the training process, when the model has seen every item in the training set, we call that an epoch. All these concepts apply to machine learning in general. They apply to all sorts of schemes for defining a model by training it with data. What makes deep learning distinctive is a particular class of architectures: the architectures based on neural networks. In particular, tasks like image classification rely heavily on convolutional neural networks, which we will discuss shortly.
To train a model: 1. A dataset called the Oxford-IIIT Pet Dataset that contains 7,349 images of cats and dogs from 37 breeds will be downloaded from the fast.ai datasets collection to the GPU server you are using, and will then be extracted. 2. A pretrained model that has already been trained on 1.3 million images using a competition-winning model will be downloaded from the internet. 3. The pretrained model will be fine-tuned using the latest advances in transfer learning to create a model that is specially customized for recognizing dogs and cats. Another key piece of context is that deep learning is just a modern area in the more general discipline of machine learning. Machine learning is, like regular programming, a way to get computers to complete a specific task. But how would we use regular programming to do what we just did in the preceding section: recognize dogs versus cats in photos? We would have to write down for the computer the exact steps necessary to complete the task. Right back at the dawn of computing, in 1949, an IBM researcher named Arthur Samuel started working on a different way to get computers to complete tasks, which he called machine learning. In his classic 1962 essay “Artificial Intelligence: A Frontier of Automation,” he wrote:
    Programming a computer for such computations is, at best, a difficult task, not primarily because of any inherent complexity in the computer itself but, rather, because of the need to spell out every minute step of the process in the most exasperating detail. Computers, as any programmer will tell you, are giant morons, not giant brains.
Machine Learning "recommendation system" that can predict what products a user might purchase. This is often used in ecommerce, such as to customize products shown on a home page by showing the highest-ranked items. But such a model is generally created by looking at a user and their buying history (inputs) and what they went on to buy or look at (labels), which means that the model is likely to tell you about products the user already has, or already knows about, rather than new products that they are most likely to be interested in hearing about. That’s very different from what, say, an expert at your local bookseller might do, where they ask questions to figure out your taste, and then tell you about authors or series that you’ve never heard of before. Another critical insight comes from considering how a model interacts with its environment. This can create feedback loops, as described here: 1. A predictive policing model is created based on where arrests have been made in the past. In practice, this is not actually predicting crime, but rather predicting arrests, and is therefore partially simply reflecting biases in existing policing processes. 2. Law enforcement officers then might use that model to decide where to focus their policing activity, resulting in increased arrests in those areas. 3. Data on these additional arrests would then be fed back in to retrain future versions of the model. This is a positive feedback loop: the more the model is used, the more biased the data becomes, making the model even more biased, and so forth.