Posts

Calculating Top N items per Group (Without Window Functions)

Image
These kinds of questions are more common than you think: Out of a list of Twitter users, find their N recent tweets. Out of all the cities, find the top N cities in each country by population. Out of all orders on the e-commerce website, find the top N recent orders for each customer. Solving these questions becomes pretty easy if we use a window function like row_number(), but solving this without any window function makes the task challenging. Here we will see both approaches. Question: Display the top 3 recent values from every group Table : groups                                            Required Output: Method 1 (Using Window Function): WITH ranked AS ( SELECT group_id,value, ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY value DESC ) AS rn FROM groups ) SELECT group_id,value FROM ranked WHERE rn <= 3 ; Method 2 (Without Window Function): SELECT t1.group_id,t1.value FROM groups t1 INNER JOIN groups t2 ON t1.group_id = t2.group_id AND t1.value

MapReduce : Sinking Ship Or Rescue Boat ?

MapReduce: The Ultimate Programming Paradigm for Parallel Processing MapReduce was one of the major initiatives taken in the field of parallel processing or distributed computing in 2004 by researchers of none other than the tech giant google which is dealing with a tremendous amount of data at that time. Since then, it is the survival of MapReduce theory for such a long time showcases the hard work and brainstorming of the inventors. Despite new frameworks getting developed, MapReduce has sustained itself throughout the course of time. In fact, many of the parallel computing engines followed the path carved by MapReduce theory. After 18 years down the line, enthusiasts generally get confused about where to start their journey in the era where so many low-code & no-code frameworks are evolving. But IMHO, any beginner enthusiasts should look to first get their hands dirty on the MapReduce concepts and then move to the trending distributed computation architectures. Once you get the

Leverage the mind smartly, How?

Add new learnings to your mind effectively. Mistakes : While studying/learning any new topic, we all tend to make the same mistake time and again. Whenever we come across an unknown topic, we go through the internet, read any article, or watch any random youtube videos. Yes, this way, we understand the topic and just stop there and after some time, when we need the same thing, we repeat the same process of surfing the internet. As you are reading the same topic from different sources every time you need it, it takes a long time for your mind to grasp the thing. So, What's the solution? What we can do is, after getting sufficient knowledge on the topic, make a note on it which you can revise later point in time and don't need to go through the web. And if the topic is too large and complex to prepare notes on, bookmark the webpage/video you learn it on and visit the same exact thing next time. what happens is when you read something from the same thing again and again, it builds

Weird Syntax - Combine Recursive CTE with Normal CTE (PostgreSQL)

While writing recursive SQL queries    in PostgreSQL   , there is a very weird syntax of Recursive Common Table Expression(CTE) . Single CTE clause : WITH cte AS ( -- query ) -- main-query; Multiple CTE clause : WITH cte1 AS ( -- query ), cte2 AS ( -- query ) -- main-query; Single Recursive CTE clause : WITH RECURSIVE recur_cte AS ( -- base-query UNION [ ALL ] -- recur-query ) -- main-query; Things look pretty normal until here, but now comes the syntax twist. Combined Recursive CTE clause : WITH RECURSIVE cte AS ( -- query ), recur_cte AS ( -- base-query UNION [ ALL ] -- recur-query ) -- main-query; Conclusion :  Even though we have our second CTE as recursive nature, PostgresSQL wants us to put RECURSIVE keyword right after the WITH keyword in the first CTE itself and it will identify the cte_recur itself, where you have written the base-query & recur-query.

Parquet is best suitable for Nested Data , But Why ?

Yes, all the folks who are working with big data must have heard this statement, but very few of them are aware of the real reason why parquet has the upper hand when it comes to storing nested data. It is the underlying architecture and hierarchical data model of parqeut which is making things easy for the nested data. This architecture of parqeut follows the Dremel paper published by Google. Here is the blog post which justifies the famous statement: https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet

Single vs Double Quotation marks (In PostgreSQL)

While writing sql queries there can be a lot of confusion when it comes to using single vs double quotation marks, especially in a database like PostgreSQL where both of them can not be used interchangeably (unlike MySQL) Even though it is very basic topic, it is very important and can having clarity on it can save a lot of time while writing and debugging the sql queries in PostgreSQL. Here's the link that i found that can reduce the dilemma upto an extent. https://streamofcoding.com/single-quote-vs-double-quotes-in-postgres/ Please feel free to add your thoughts or resources related to the topic in the comments.