© unsplash/@socialcut

© unsplash/@socialcut
Retail, E-commerce & Marketplaces

Identification of fake products

Context

Selling fake products can considerably damage the seller-buyer relationship and potentially lead to long legal proceedings causing high costs and losses. This problem is especially present in large marketplaces such as Amazon Marketplace. Identifying fake products manually could prevent these problems, but is often not feasible or requires a lot of well-trained personnel, since many fake products are difficult to identify as "fake".

Challenges

To train a machine learning model, the model requires a sufficient large training data set, where products are labeled as "fakes" and ideally, characteristics of fake products are identified and recognized. This may be a difficult task, since the labeling of fake products requires very competent personell and may not be possible for every product.

Moreover, retailers might be reluctant to give the data to external firms as the amount of fake products would make the problem of fake products on their platform very transparent. Additionally, producers of fake products are adapting fast once they are discovered and change their tactics and marketing.

Potential solution approaches

The automation of identifying fake products can be done by implementing a machine learning model, that identifies fake products on the basis of recognizing patterns in images, such as production features or wrong brand labels. The identification of fake products is based on the classification of images by means of specific inherent features of the product.

For such image segmentation and classification tasks, convolutional neural networks (CNN) algorithms are used, such as U-Net or Mask R-CNN architectures. These are capable of recognizing different objects in the images, such as brand labels, and can extract its specific attributes, e.g. the stitching, which may indicate a fake.

In addition to the CNN, a generative adversarial model (GAN) can be used, which generates data based on real data, in order to account for the lack of training data of fake products. Using a GAN reduces also the required training effort.