We earn commission when you buy through affiliate links.
This does not influence our reviews or recommendations.Learn more.
Have you been using SQL Window functions?
To address this concern, weve created this tutorial to help you enhance your SQL operations.
If you havent used SQL Window functions, this piece is a good start.
Window functions are calculation functions that reduce the complexity of SQL queries, increasing their efficiency.
SQL window functions work in three simple steps.
First, you define a window.
This is because Windows functions operate on a set of rows defined using theOVER()clause.
Next, thePARTITION BYclause divides the result set into partitions to which the function is applied.
And lastly,ORDERby clause to determine the order of rows in each partition.
Lets get into a hands-on approach to working with SQL window functions.
For every category, youll have a table summary for quick reference.
Heres a breakdown of each.
#1.Function
As for the function, it indicates the computation operation youre targeting at the row.
It could be standard aggregate functions (e.g.,SUM,AVG, andCOUNT).
Or an analytic function (e.g.,ROW_NUMBER,RANK,LEAD,andLAG).
Heres an example using theSUMfunction.
#2.OVER Clause
Using theOVERclause, you define the rows over which the function will operate.
There are two constituents:PARTITION BYandORDER BY.
The first is optional and divides results into sections to which the window function is applied.
For the latter, it specifies the order of rows in each partition.
When not specified, the function treats the window as an unordered set.
Its superpower can be realized when performing calculations independently with each partition.
For the above sample, theAVGfunction calculates the average ofcolumn3independently for each distinct value incolumn1.
Simply put, for every group of items incolumn1, youll havecolumn3with the average weight for that specific item.
This means they perform calculations on a window related to the current row within the result set.
you’re free to use them to obtain aggregate values based on a specified window/frame.
Heres a brief description of each.
#1.MIN()
This function returns the minimum value of a specified expression over a frame.
Lets look at an example query, particularly the moving, sliding, or rolling minimum.
This, in turn, dictates that each row in the result setmoving_mincontains the minimum value ofcolumn2.
Note that I did not include thePARTITION BYclause.
#3.AVG()
This function returns the average value of a specified expression over a frame.
Below is the syntax.
For a sample query, consider the cumulative sum below.
#5.COUNT()
A function used to return the number of rows in a window.
Heres a quick cheat sheet for aggregate window functions.
Heres a brief overview of each of the value window functions.
Simplified, it shifts values one row up.
The syntax for calling it is similar toLAG().
Heres how to write it.
#2.LAG()
TheLAG()function is the most popular.
It allocates to each row a value of the preceding one.
In other words, it shifts any column by a row, letting you perform queries using shift values.
The syntax is as follows:
An example query can be written as shown below.
The query extracts the previous value of a column ordered by thedate_column.
When theres no previous value, a default value of 0 occurs.
Youll also specify the expression you want to retrieve the nth value.
For our examples, weve used a column.
Its been the same forLAST_VALUE()andFIRST_VALUE().
They are useful when analyzing and ordering data and identifying relative positions for rows.
If window functions are unavailable, youd have to write multiple nested queries, which are inefficient.
Keynote in ranking window functions: theORDER BYclause must always be present.
Below are brief explanations about ranking functions in the family.
Its syntax…
To put that into perspective, look at the query below.
It does not skip any numbers, assigning succeeding values to the consequent row.
Write it as…
For your test case, consider the query below.
#4.PERCENT_RANK()
This function utilizes theRANKfunction to define the final ranking.
The 0 value is allocated to the first row, and 1 takes the last.
#5.QCUT (NTILE)
While rarely used, it operates likeROW_NUMBER.
However, give numbers to a collection of rows (buckets) instead of numbering rows.
The number of buckets is passed as an argument to theN-TILEfunction.
For example,N-TILE(10)divide the dataset into 10 buckets.
The example below divides employees into quartiles, as described in the function.
It assigns a value between 0 and 1 to represent the relative positions of rows.
0 At the start and 1 at the end.
A quick reference sheet for ranking window functions is available below.
Of course, the syntax may vary differently based on your database selection.
Be sure to seek the correct syntax for your choice.
For further exploration, always look into the documentation.
This means working with the right one, whether onSQLorPostgreSQL.
you might now check ourfull SQL cheat sheetthat my colleague and I use throughout our daily developer endeavors.