interview preparation
Percentages in SQL: PERCENT_RANK()
and NTILE(n)
In SQL, functions like PERCENT_RANK()
and NTILE(n)
are often used to calculate ranks or to divide data into specific percentages or buckets. These window functions are useful when you need to analyze data distribution or understand where a specific data point lies in relation to others.
1. PERCENT_RANK()
: Rank as a Percentage
PERCENT_RANK()
is a window function that calculates the relative rank of a row within its partition, expressed as a percentage. It returns a value between 0 and 1, where the value represents the percentage rank of the row based on its position in the ordered set of rows.
Formula:
sqlCopy codePERCENT_RANK() = (Rank - 1) / (Total Rows in Partition - 1)
This means that:
- A value of
0
means that the row has the lowest rank in the partition. - A value of
1
means that the row has the highest rank in the partition.
Example: Given a dataset of sales figures, let’s calculate the percentage rank for each salesperson:
SalesRep | SalesAmount |
---|---|
Alice | 100 |
Bob | 200 |
Charlie | 150 |
Dave | 300 |
Eve | 250 |
You can use the following query to calculate the PERCENT_RANK()
:
sqlCopy codeSELECT
SalesRep,
SalesAmount,
PERCENT_RANK() OVER (ORDER BY SalesAmount) AS PercentRank
FROM Sales;
Output:
SalesRep | SalesAmount | PercentRank |
---|---|---|
Alice | 100 | 0.00 |
Charlie | 150 | 0.25 |
Bob | 200 | 0.50 |
Eve | 250 | 0.75 |
Dave | 300 | 1.00 |
- Alice has the lowest sales, so her rank is
0%
. - Dave has the highest sales, so his rank is
100%
. - Others fall in between, with ranks distributed based on their relative positions.
2. NTILE(n)
: Divides Rows into n
Buckets
NTILE(n)
is a window function that divides the dataset into n
equal parts (or buckets). It assigns a bucket number to each row. This is useful for dividing data into quantiles or other similar categories, like percentiles, quartiles, deciles, etc.
n
specifies the number of buckets into which the data should be divided.- Rows are assigned a bucket number from 1 to
n
based on their ranking within the ordered set.
Example: If you want to divide the data into 4 buckets (quartiles), you would use NTILE(4)
.
Using the same sales data example, we can calculate the quartiles using NTILE(4)
:
sqlCopy codeSELECT
SalesRep,
SalesAmount,
NTILE(4) OVER (ORDER BY SalesAmount) AS Quartile
FROM Sales;
Output:
SalesRep | SalesAmount | Quartile |
---|---|---|
Alice | 100 | 1 |
Charlie | 150 | 1 |
Bob | 200 | 2 |
Eve | 250 | 3 |
Dave | 300 | 4 |
- The data is divided into 4 quartiles. Alice and Charlie are in the first quartile (the lowest 25%), while Dave is in the fourth quartile (the highest 25%).
Key Differences Between PERCENT_RANK()
and NTILE(n)
:
Function | PERCENT_RANK() | NTILE(n) |
---|---|---|
Purpose | Provides a percentage rank of each row in the partition. | Divides the result set into n equal-sized groups (or buckets). |
Output | A percentage between 0 and 1 representing the relative rank. | A bucket number (integer) between 1 and n . |
Use Case | Ranking data relative to the other rows (e.g., percentile ranking). | Dividing data into groups (e.g., quartiles, deciles). |
Range | 0 to 1 (percentage). | Integer values from 1 to n , where n is the number of buckets. |
Common Use Cases:
PERCENT_RANK()
is commonly used when you need to:- Rank data points within a dataset or partition.
- Understand how a value compares to others in terms of its relative percentage.
NTILE(n)
is used when you want to:- Divide a dataset into buckets for further analysis.
- Create groupings such as quartiles, deciles, etc., and compare the distribution of data.
Example Use Cases
1. Calculating Percentiles with PERCENT_RANK()
:
Let’s assume you want to calculate the percentiles for the SalesAmount
: This can help you understand the position of each salesperson’s sales as a percentile of the entire dataset.
SELECT
SalesRep,
SalesAmount,
PERCENT_RANK() OVER (ORDER BY SalesAmount) AS Percentile
FROM Sales;
2. Dividing Data into Quartiles using NTILE(4)
:
If you wanted to split the sales data into quartiles (4 equal groups), you could use: This would allow you to segment sales representatives into quartiles, giving you a clearer understanding of how sales are distributed.
SELECT
SalesRep,
SalesAmount,
NTILE(4) OVER (ORDER BY SalesAmount) AS Quartile
FROM Sales;
Common Interview Questions on PERCENT_RANK()
and NTILE(n)
- What is the difference between
PERCENT_RANK()
andNTILE(n)
?- Answer:
PERCENT_RANK()
calculates the relative rank of a value as a percentage of the total number of rows, whereasNTILE(n)
divides the result set inton
equal-sized groups (buckets) and assigns a group number to each row.
- Answer:
- How would you calculate the percentile rank for each student based on their scores in a class?
- Answer: You would use
PERCENT_RANK()
to calculate how each student’s score compares to others in terms of percentile:
- Answer: You would use
SELECT
StudentName,
Score,
PERCENT_RANK() OVER (ORDER BY Score) AS PercentileRank
FROM Students;
3. Can PERCENT_RANK()
produce a result of 1?
Answer: No, PERCENT_RANK()
can return values between 0
and 1
, but never exactly 1. The highest value will approach 1, but it will be less than 1, because the formula divides by (total rows - 1)
.