Product feature extraction and standardization
Clients on e-commerce websites search for products based on certain features, e.g. blue jeans or camera pixels in smartphones. The option to filter and quickly find the right product can be a major differentiator in the market, which drives purchases and conversions for e-tailers and marketplaces.
The information about the features is available but might be distributed over different input sources and be available in different formats. For example, a fashion retailer wants to be able to filter by material, dominant colour and garment. However, fashion brands and e-tailers marketing their products on platforms such as Amazon Marketplace provide different information on products, sometimes even on the same products. This information needs to be unified such that the information is searchable and users can filter by fixed terminology within a website.
Potential solution approaches
Approximate string/pattern matching algorithms have delivered good results on similar tasks in the past. The features or attributes might be hand crafted or automatically created to provide rule-based approaches to match similar items and categorize them accordingly. This approach might work in cases where descriptions are relatively similar but not context dependent. If the context (e.g. product category) changes the semantics of the description, deep neural networks like CNNs or RNNs are alternatives, including word embeddings such as BERT.