Chapter 1 Introduction

As the primary crowd-sourced review platform, Yelp is a major source of voice-of-the-customer (VOC) materials on local businesses. Recognizing the incredibly rich values embedded in the text, Yelp has made available to the academic community a large sample of the reviews and is constantly updating the publicly available dataset. The dataset is consists of not only reviews but also detailed information about the businesses and users associated with the reviews, as well as check-ins and photos. For the scope of this project, we focus particularly on restaurant reviews, and the 10th version of the dataset provided 3 million such reviews of 51,625 restaurants.

As on the platform, each review in the dataset is associated with a numerical score of 1 to 5 for the reviewed restaurant. Many of the current explorations of the Yelp dataset have been focusing on the correspondence between the review and the score, for example, predicting the numerical score with a given review text, assuming that the score is a reasonable proxy of the opinions expressed in the text. This overall score for the business, however, can be too generalized when different aspects of the dining experience were mentioned, and different or sometimes conflicting feelings were expressed. In response to this observation, this project particularly focuses on extracting common topics (categories of aspects) discussed in restaurant reviews with topic modeling, as well as mining a given restaurant’s most-liked or disliked highlights based on its reviews using aspect-based sentiment analysis.