To be a bit more technical, our image classification model is based on the popular ResNet model. ResNet is a deep convolutional neural network that has found widespread use in image classification tasks. The simplest such model is ResNet18, consisting of 18 convolutional layers, together with Residual mappings which allow the network to mitigate the vanishing gradient problem by enabling the flow of gradients directly through the network's layers. This is achieved by introducing "shortcut connections" that skip one or more layers, allowing the network to learn identity mappings more efficiently. These residual connections help maintain high performance even as the network depth increases, making ResNet models especially effective for tasks involving complex, hierarchical patterns in image data.
Pre-trained ResNet models available through PyTorch are one of the quickest and easiest ways to get a classification model running. These networks have been pre-trained on the ImageNet dataset, a massive dataset of over 14 million images that is one of the most ubiquitous datasets in computer vision. Although the pre-trained ResNet model is a very competent model on the ImageNet dataset, the images that we want to classify appear very different to the standard ImageNet images, and therefore the model requires some modification before we can use it for our task.
The first change is to the output of the ResNet model - by default the ResNet-18 model predicts one of a thousand different classes (cat, dog, plane, etc), but for our task we are interested in a relatively small number of different types of defect (around 10 different classes). To modify the ResNet model for our task we simply remove the final layer output of the ResNet Neural Network and replace it with a linear layer with the desired number of outputs (there are more complicated options, but this is the most straightforward). The weights for this layer are initially randomised, so the model requires some additional training in order to make sensible predictions, which brings us to the second change - finetuning.
Fine-tuning a machine-learning model involves retraining an existing model on a custom dataset in order to improve the accuracy of predictions based on the specifics of the custom data. It is possible to freeze certain layers of a neural network during retraining so that only particular layers are trained, which can be used to specifically fine-tune the final linear classifier layer of the modified ResNet. This approach allows the model to retain the general features it has already learned from large datasets like ImageNet while adapting to the specific patterns and nuances of the custom dataset. Fine-tuning the final classifier layer is often more efficient and requires less data than training a model from scratch, leading to improved performance on specialized tasks.
Once the modified model is fine-tuned it can be tested by making predictions about unseen data. If the performance metrics of the model are acceptable, the model can be used in a production environment for a variety of purposes - the predictions can be collected and used for manufacturing reports, or the model can be linked to autonomous or semi-autonomous systems to perform actions when certain predictions occur.