Background – For online retailers, achieving higher sales conversions can be contingent on something as simple as the order in which results appear when a customer searches. A higher relevance in search results can be instrumental in driving purchase at that moment and also increase repeat purchases due to more pertinent recommendations. These search results and recommendations are in part determined by the appropriate classification of the underlying products. The more detailed and accurate the descriptions of products are, the more accurate the keyword associated with the products, leading to better results of the algorithms which control the customer purchase process. Improving this process can, therefore, lead to an increase in product sales from both initial and repeat customers.
The current classification of products can be described in the following process; a seller submits a certain product to be sold on an online retailer site (such as Amazon.com). This product must be classified into various product categories before it can be sold to customers. With clothing, an algorithm may exist which classifies the article into a male or female category. Next, within the gender classes, another algorithm will classify the product into a clothing type, such as a t-shirt, slacks, dress, etc. This automatic classification allows the online marketer to display appropriate results when a customer searches for a product. There is however further room for this classification process to be improved.
Mission – A more granular level of product classification will lead to more targeted customer experiences and potentially higher sales. To address this opportunity, several techniques can be used to refine the existing classification process. This technique demonstrated here involves (1) the extraction of a style from the image of an article of clothing (2) the representation of each style as embeddings and (3) the usage of these embeddings to further classify articles of clothing into their respective styles. This process is added to the end of the current classification pipeline and could allow for better results, recommendations, and product clustering. Although clothing is used in this example and in the subsequent demonstration of the technique itself, it could be used in other areas in which the appearance of an object is a strong contributor to a customer’s decision to purchase.
Data – The dataset consists of 500 shirt images each of the following 3 style classes: Sports, Casual, and Formal, for a total of 1500 images. The dataset was further divided into 1200 training images and 300 test images (100 of each class).
Methodology – The methodology for the creation of the style classification network consists of the following steps:
1) A Style Transfer Deep Neural Net is used to extract the “styles” from the images in the dataset. These types of neural nets are typically used to transfer the style of one image onto the content of another image to create a hybrid of the two. For our purposes, only the first part of this technique is used to extract the style of each of the images.
2) The extracted styles are embedded onto a different subspace to create style embeddings that can then be used as inputs into different tasks such as classification, recommendation, and clustering. Embedding is commonly used to transform categorical data into numerical vectors for usage into neural networks.
3) Style embeddings are used as inputs into a classification algorithm that differentiates between the three shirt styles.
Results – The overall accuracy is 59.7 % and an AUC of .7805. The misclassifications are most often occurring within the casual and formal categories. The sports category, on the other hand, has a much smaller number of errors. It should be noted that the dataset used in training was purposely imbalanced. This was deliberately done to identify whether the architecture had a flawed design and lead to bad generalization of the problem or if the samples used for training were insufficient. From the confusion matrix, it is clear that the sample sizes of the data were influential in determining the accuracy in the classification of the classes. For example, there is minimal misclassification in sports, which had more training data. This indicates that the model architecture itself is not a problem, rather, more training data is required for the improvement of the model accuracy.
The above figure from the left represents the confusion matrix of the validation data, followed by model accuracy and loss across epochs. It is apparent that an increase of epochs leads to higher accuracy as well as a decrease in loss. The proximity of these values for training and validation toward the later iterations indicates a convergence of the model.
Implementation - The benefits of using the proposed technique may be many but it must also be feasible to implement into the existing architectures for it to be truly useful. Herein lies an important benefit of this solution; it can exist as an added step onto current support systems that classify products. The system of classification does not need to be rebuilt, rather it is further enhanced with the addition of the style classification network. The below figure shows one such example of where the solution would exist within an existing data science pipeline.
Future Scope - An additional benefit to this solution is the potential uses beyond classification. In the above figure, the “Decision Engine” block of the pipeline is so labeled because the style embeddings which feed into this block can be useful for a number of additional tasks beyond product classification. As mentioned in the introduction, one such task would be the recommendation of products based on not only the gender, type, or brand of product but also on the style of product that was purchased. For example, if a particular customer had frequently purchased clothing articles with striped patterns, they could be recommended similar patterns when purchasing other items such as furniture, curtains, or bathroom accessories.
Because of the addition of the style classification network onto an existing data pipeline, the incremental cost of implementation is minimal. Once the model was build and deployed, the cost would lie in the retraining of the algorithm once additional data made the existing algorithm obsolete. This retraining of the model would not be required frequently and should not be highly involved. Additionally, the retraining of this particular model is much more efficient because it is relatively simple and is only one part of the pipeline. In contrast, if a single complex model was used for the entire classification process (such as VGG) it would incur a much higher retraining cost.