Analytics Window Function

Revolutionary Analytics Window Function: Unlocking Advanced Data Insights for Maximum Impact

Analytics Window Functions in SQL

Analytics window functions in SQL are a subset of window functions that perform complex analytical calculations across a set of rows related to the current row. These functions allow for advanced analysis, such as ranking, calculating running totals, moving averages, and more, based on an ordered window of rows.

The key difference between standard aggregate functions and analytics window functions is that while aggregate functions collapse rows into a single value per group, analytics functions maintain individual rows but provide calculated values across those rows. These functions are typically used for trend analysis, time-series analysis, and calculating values relative to a “window” of data.

Common Analytics Window Functions

  1. LEAD(): Provides access to a row’s subsequent row’s value within the result set (useful for comparing rows).
  2. LAG(): Provides access to a row’s preceding row’s value (useful for comparing previous and current rows).
  3. NTILE(): Divides the result set into a specified number of buckets or “tiles” and assigns each row a bucket number.
  4. FIRST_VALUE(): Returns the first value in the window or partition.
  5. LAST_VALUE(): Returns the last value in the window or partition.
  6. NTH_VALUE(): Returns the nth value in the window, where n is a specified position.

Basic Syntax for Analytics Functions

SELECT 
    column1, 
    column2, 
    LEAD(column2) OVER (PARTITION BY column1 ORDER BY column2) AS LeadValue,
    LAG(column2) OVER (PARTITION BY column1 ORDER BY column2) AS LagValue,
    NTILE(4) OVER (PARTITION BY column1 ORDER BY column2) AS Quartile
FROM table_name;
  • PARTITION BY divides the data into groups.
  • ORDER BY specifies how rows are ordered within each group.

Examples of Analytics Window Functions

Let’s consider a table of sales data:

SaleDateSalesAmountSalesRep
2024-01-01100Alice
2024-01-02200Bob
2024-01-03150Alice
2024-01-04300Bob
2024-01-05400Alice
2024-01-06500Bob

1. LEAD() Example (Next Row’s Value)

The LEAD() function allows you to access data from the next row in a result set.

SELECT 
    SaleDate, 
    SalesAmount, 
    SalesRep,
    LEAD(SalesAmount) OVER (PARTITION BY SalesRep ORDER BY SaleDate) AS NextSaleAmount
FROM Sales;
SaleDateSalesAmountSalesRepNextSaleAmount
2024-01-01100Alice150
2024-01-03150Alice400
2024-01-05400AliceNULL
2024-01-02200Bob300
2024-01-04300Bob500
2024-01-06500BobNULL

In this case, LEAD(SalesAmount) gives the sales amount of the next sale for each salesperson. For the last row of each partition (each salesperson), the next value is NULL.

2. LAG() Example (Previous Row’s Value)

The LAG() function allows you to access data from the previous row in a result set.

SELECT 
    SaleDate, 
    SalesAmount, 
    SalesRep,
    LAG(SalesAmount) OVER (PARTITION BY SalesRep ORDER BY SaleDate) AS PreviousSaleAmount
FROM Sales;
SaleDateSalesAmountSalesRepPreviousSaleAmount
2024-01-01100AliceNULL
2024-01-03150Alice100
2024-01-05400Alice150
2024-01-02200BobNULL
2024-01-04300Bob200
2024-01-06500Bob300

In this example, LAG(SalesAmount) gives the sales amount from the previous row for each salesperson. For the first row of each partition, the previous value is NULL.

3. NTILE() Example (Dividing into Quantiles)

The NTILE() function divides the result set into a specified number of approximately equal groups and assigns a bucket number to each row.

SELECT 
    SaleDate, 
    SalesAmount, 
    SalesRep,
    NTILE(3) OVER (PARTITION BY SalesRep ORDER BY SalesAmount DESC) AS SalesQuartile
FROM Sales;
SaleDateSalesAmountSalesRepSalesQuartile
2024-01-05400Alice1
2024-01-03150Alice2
2024-01-01100Alice3
2024-01-06500Bob1
2024-01-04300Bob2
2024-01-02200Bob3

Here, NTILE(3) divides the sales amount into 3 quartiles for each salesperson, ranking them in descending order of sales.

4. FIRST_VALUE() Example (First Value in Partition)

The FIRST_VALUE() function returns the first value in the partition or window.

SELECT 
    SaleDate, 
    SalesAmount, 
    SalesRep,
    FIRST_VALUE(SalesAmount) OVER (PARTITION BY SalesRep ORDER BY SaleDate) AS FirstSaleAmount
FROM Sales;
SaleDateSalesAmountSalesRepFirstSaleAmount
2024-01-01100Alice100
2024-01-03150Alice100
2024-01-05400Alice100
2024-01-02200Bob200
2024-01-04300Bob200
2024-01-06500Bob200

FIRST_VALUE(SalesAmount) gives the first sales amount for each salesperson, regardless of the row’s position in the result set.

5. LAST_VALUE() Example (Last Value in Partition)

The LAST_VALUE() function returns the last value in the partition or window.

SELECT 
    SaleDate, 
    SalesAmount, 
    SalesRep,
    LAST_VALUE(SalesAmount) OVER (PARTITION BY SalesRep ORDER BY SaleDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastSaleAmount
FROM Sales;
SaleDateSalesAmountSalesRepLastSaleAmount
2024-01-01100Alice400
2024-01-03150Alice400
2024-01-05400Alice400
2024-01-02200Bob500
2024-01-04300Bob500
2024-01-06500Bob500

LAST_VALUE(SalesAmount) returns the last sale amount for each salesperson in the dataset.


Common Interview Questions on Analytics Window Functions

  1. What is the difference between LEAD() and LAG()?
    • Answer: LEAD() gives the value of the next row in the dataset, whereas LAG() gives the value of the previous row. Both functions allow you to compare rows relative to the current row.
  2. How would you calculate the running average of sales for each sales representative?
    • Answer: Use the AVG() function with a window to calculate the cumulative average.
    sqlCopy codeSELECT SaleDate, SalesAmount, SalesRep, AVG(SalesAmount) OVER (PARTITION BY SalesRep ORDER BY SaleDate) AS RunningAvgSales FROM Sales;

SQL