Question: Assume you are given the below table on watch times (in minutes) for all users, where each user is based in a given city. Write a query to return all pairs of cities with total watch times within 10000 minutes of one another.
Difficulty Level: Easy, Source: Nicksingh, Tag: Disney SQL Interview Question
Assumptions: No NULLS present; DBMS: Spark SQL
Output Columns: city_A, city_B
Approach:
Steps: Find the total watch time for each city, and save this as a CTE. Then do a self join on this table returning the city columns from both tables where the join clause matches any city with watch time in the given range.
Note: since the question asks about two watch times being X mins of one another, either of the cities (A or B) can have a higher watch time. Also, avoid joining the city with itself.
with city_watch_time as ( select city_name, sum(watch_time) as total_watch_time from watch_activity group by city_name) select t1.city_name, t2.city_name from city_watch_time t1 join city_watch_time t2 on abs(t1.total_watch_time-t2.total_watch_time) <=10000 and t1.city_name != t2.city_name --- to avoid getting the same city as pair