”
All questions about Yelp dataset that post on Yelp.com. How to answer the question if use Spark or SQL in Databrick, and Tableau. How you would answer the question (what data are you using, what are including, what are you eliminating, what are you joining, etc.). if it’s easier to describe with Spark or SQL terminology, that’s OK, but do not write code. Say how you would show your result. Describe or draw a graph to illustrate a possible answer. Identify any issues, possible solutions to those issues (that you could implement), and the limitations of those solution.Question One Yelp uses a hierarchical controlled vocabulary of categories to describe businesses. If we consider the 21 top‐level categories, some businesses are in categories that could be described as more “fun” such as “Nightlife”, some are more serious, such as “Health & Medical”, and some are more business like such as ”Financial Services”. Other Yelpers can vote whether reviews are funny, useful, or cool. Are there some categories that have reviews that get a higher percentage of funny votes, while other categories get a higher percentage of useful votes? Question Two Our dataset contains every review written in each of 10 metro areas. We have user data on the Yelpers who wrote those reviews, but we only have the reviews they wrote in those 10 metro areas (e.g., San Jose is not in our data, so any reviews written about businesses in San Jose are not in our data). Do Yelpers who are local rate businesses higher than those from out‐of‐town, or is the opposite true? Using the portion of a user’s reviews that are in the dataset as a proxy for whether they are local or not, determine whether locals or visitors generally rate businesses higher. Question Three Each business in Yelp is described by one or more categories. Restaurants on Yelp can be described by the multiple cuisines that they serve. Some businesses serve one cuisine (e.g., “Thai”), while others may serve multiple cuisines (e.g., “Hotdogs” and “Wings”). If a restaurant serves multiple cuisines, they could match a search for either cuisine. Possibly some cuisines are generally rated higher than other cuisines (on average across all restaurants serving just that cuisine). For example, maybe users will generally rate restaurants serving just “Wings” higher than restaurants serving just “Burgers”, even if objectively a food expert would say they are of the same quality. The question you are looking into for Yelp is whether restaurants serving two cuisines have an average rating closer to the cuisine that is rated higher (“Wings” in the prior example), or the cuisine that is rated lower (“Burgers” in the prior example). Your answer should NOT be specific to these two cuisines (Wings and Burgers), but instead should handle all restaurant serving two cuisines.