Question: Assume you are given a table for spending activity by product type. Write a query to calculate the cumulative spend for each product over time (days) in chronological order.
Difficulty Level: Medium, Source: NickSingh.com, Tag: UBER SQL Interview Question
Note: Cumulative sum is the same as running totals, which is used to display the data as it grows with time.
Assumptions: No NULLS present; DBMS: Spark SQL
Output Columns: product_id, date, cum_spend
Approach:
Basic Explanation: A given product (product_id) could be associated with multiple users (user_id) over different orders (order_id)
Step-1: Understand that the question asks for cumulative spend on each product over time (days), this requires for aggregation of spend over each day for every product.
Step-2: Once you have the daily spend of each product, we just need an appropriate function to sum the daily values to get a running total.
Here is a simple block diagram using one product (prod_A)
Solution:
Hint: Find the daily spend of each product first then think about getting to cumulative spend
with daily_spends as( select product_id, date, sum(spend) as daily_spend from total_trans group by product_id, date) select product_id, date, sum(daily_spend) over(partition by product_id order by date) as cum_spend from daily_spends order by product_id, date