Report of "Restaurant Advisor"
CAP 5768: Fall 2019
Group 7
Team menmber
Peng Hou,
Hongjing Wang,
Zheya Wu,
Yan Xiao,
Linlin Zhou
Motivation and Purpose
- Initially, we want to give the suggestion to businesses to choose the location where to open a restaurant. But as we dig more on our data, we found it’s more suitable to give informations on a user chosen area. The informations is about all the factors we can find related to make a decision about whether this is a good place to have a business. If so, what kind of foods should I serve and what key factors should I care most in this area.
Dataset Description
- Dataset: The Yelp dataset, it is a subset of their businesses, reviews, and user data.
- Size: row data, 8.69 gigabytes uncompressed
- Link:https://www.yelp.com/dataset
- Other datasets that are related to the City of Phoenix that are collected from the City of Phoenix Mapping Open Data, including the location of parks, hospitals, place of pride, and etc.
Methods used for the data Analysis
Graph 1:
- Cauculate the TF-IDF of reviews, give the top 10 words of reviews in single restaurant.
- Group the restaurants to three groups: star 4 to 5, star 1 to 3, closed.
- Collect all keywords in same group, sum the weight of same words.
Graph 2:
- The purpose of Graph 2 is to identify the important attributes that affect the stars of restaurants and their corresponding most common values for each group of restaurants using the business data.
- The x axis of Graph 2 is the most common value of 10 important features and the y axis is the percentage occurrence of that value among the neighboring restaurants in each group.
- A number of preprocess steps were first conducted on the business data, including removing attributes with a lot of missing data, checking unique values of each attribute and removing the attributes with only one value.
- The Random Forest method was applied to identify top 10 attributes with highest importance.
- The most common values for the top 10 attributes are found for each restaurant group and presented in Graph 2.
Part of Results
- Our project can give the analysis of any location you want, you just set the radius and click on the map, there will be a report page pop out to tell you all the data. There is an example of Graph1 :
- this is the top five key words data of "star 4 to 5"
1-"pizza" - 1.8639695500223836
2-"service" - 1.409654633418855
3-"breakfast" - 0.9796997063099608
4-"sandwich" - 0.9246861033680522
5-"drinks" - 0.9246623995486091
- this is the top five key words data of "closed"
1-"sandwich" - 1.381101349198041
2-"pizza" - 1.2829068308285805
3-"lunch" - 1.1741327853038381
4-"burger" - 0.9568852583433751
5-"coffee" - 0.8808648734810306
- So we can see that in this area, maybe its better to do breakfast, and foucs on service, you will have a prosperous restaurant.