META SQL Interview Question

Question: Assume you are given the below tables on users and user posts. Write a query to get the distribution of the number of posts per user for each given day.

Explanation & Definition: Each user will have a certain post count for each day, by daily distribution we mean for each day we need the post count category and the respective users count falling in that category.

Assign a category to each user based on the number of posts done on that DAY
3 categories: ‘single_post’, ‘2-posts’ , ‘>2posts’ (these should be post_count_cat column in your output)

Difficulty Level: Easy, SourceCrackingProductTag: META SQL Interview Question

Assumptions: No NULLS present; DBMS: Spark SQL

Output Columnsday, post_count_cat, num_users

Approach:

Step-1: Undertand that for this question, the users table is not required. Now since we are asked to calculate the daily post count distribution, first we need to calculate the number of posts by each user(obviously only those users who have posted at least once) on each day then use that result to get to the post count distribution.

Solution:

select date, user_post_cat, count(user_id) as num_users
from(
    select date, user_id, count(post_id) as num_posts,
    case when count(post_id) = 1 then 'single_post'
         when count(post_id) = 2 then '2-posts'
         else '>2posts' end as user_post_cat
    from posts
    group by date, user_id)
group by date, user_post_cat

Leave a Comment