Question: Assume you are given the below table on house prices of listed properties from various zip codes. Write a query to get the top 5 zip codes by market share of house prices for all zip codes. Also return the corresponding market share (in %, 2 decimal) as well and filter zip codes with less than 100 properties listed.
Explanation & Definition: market share of house price of a zip code = total value of properties in that zip code/ total value of all properties
Difficulty Level: Easy, Source: CrackingProduct, Tag: Opendoor SQL Interview Question
Assumptions: No NULLS present; DBMS: Spark SQL
Output Columns: zip_code, total_price, market_share
Approach:
Step-1: for each zip code, find the total property price, divide this price value by the total property price across all zip codes to get the corresponding market share.
Solution:
--- Method-1: select zip_code, sum(price) as total_price, round(100*sum(price)/(sum(sum(price)) over()),2) as market_share from housing group by zip_code having count(house_id) >=100 order by market_share DESC limit 5 --- Method-2: with zip_level_prices as ( select zip_code, sum(price) as total_price from housing group by zip_code having count(house_id) >=100) select zip_code, total_price, round(100*total_price/(sum(total_price) over()),2) as market_share from zip_level_prices order by 3 desc limit 5
Note: the solution shared above in method-1 will most likely work only on Spark SQL since other dbms don’t allow group by aggregation and window function aggregation in a single query.
to tackle that issue, use the results of zip_code, total price in a Common table expression (CTE) or in a subquery then using that result calculate the market share, as shown in the 2nd Method