The goal of Meta Shoe Review is to extract useful, product-specific information from the text of online shoe reviews. The data used for this project were shoe reviews scraped from Amazon.com (~250k) and DSW.com (~50k). Each meta-review consists of three parts: a unique wordcloud, a comfort prediction, and a size prediction.

To create the unique wordcloud, I constructed an algorithm to calculate how often different words occur within the reviews for a specific product relative to how often those same words occur across reviews for many different product. The result? The words that are particularly important for that shoe are emphasized and we get immediate access to important product information!

To calculate the comfort and size predictions, I used scikit-learn to construct a predictive model based on labeled review data from DSW.com. When customers submit a review to DSW.com, they are asked to rate how comfortable the product is, and whether it was smaller or larger than expected. I compared the text of reviews (features) with the customer ratings of comfort and size (labels) to build a model that directly predicts these metrics from the text. The model combined td-idf weighted term frequencies with dimensionality reduction based on SVD and linear regression. The model was then applied to the Amazon.com reviews to create a more complete picture of the different products.

Meta Shoe Review is a work and progress, and this is merely a sample of some interesting results. I hope you enjoy it nonetheless!