Question: You have been given the below table on transactions from users. Write a query to get the number of users and total products bought per latest transaction date where each user is bucketed into their latest(last) transaction date.
Explanations & Definition: every user will have a last(lastest) transaction date, also multiple users can have the same latest transaction date. Also, it’s possible that a user had multiple transactions on his/her latest transaction date. Total Products should reflect the count of all products purchased, not necessarily distinct products.
Difficulty Level: Medium, Source: CrackingProduct, Tag: Stitch Fix SQL Interview Question
Assumptions: No NULLS present; DBMS: Spark SQL
Output Columns: latest_trans_date, num_users, total_products_bought
Approach:
Steps: First calculate the last (latest) transaction date for each user and the total products bought, save that as a CTE, now using the CTE find the final answer.
select latest_trans_date, count(distinct user_id) as num_users, count(product_id) as total_products_bought --- here for the total_products_bought column value any column from the subquery can be used inside the count aggregation function from( select user_id, transaction_id, product_id, transaction_date, max(transaction_date) over(PARTITION by user_id) as latest_trans_date from user_transactions) where latest_trans_date = transaction_date --- filter only those transactions happening on the last purchase date group by latest_trans_date order by latest_trans_date