Good user experience in online retail and e-commerce depends strongly on the availability of different offers and the ease of finding the desired products. Finding products can be especially challenging, since a lot of potential buyers find their products first in images on social media platforms without information about product name or brand. The user's product search then only depends on the description of the product and can lead to frustrating results.
The product search based on images, requires the extraction of relevant features of the products from the images, e.g. type, colour and material of fashion articles. The use of a machine learning model can automate these image tagging tasks by detecting objects in images provided by the user, e.g. pullover, shirt, lamp etc., and determining its attributes and characteristics, e.g. color, material etc.. The problem is to have a sufficient large data set for the training that incorporates the many different and brand-specific product types and names in order to allow the model to provide helpful information.
The output of the model would then be an image, where the individual products are separated and provided with keywords and labels.
Image tagging is accomplished with a machine learning model for image segmentation and object detection combined with a multi-label classification ML model. The combination of convolutional neural networks (CNN) and recurrent neural networks (RNN) is the most common method for image tagging.
However, recent researches show also good performance using graph convolutional networks (GCN). While CNNs are the state of the art for binary image classification tasks, RNNs and GCNs are able to capture the label correlations and dependencies.