SQL窗口函数用法

窗口函数是一种在关系型数据库中执行聚合、排序和分析操作的强大工具。它们在处理查询结果时提供了更多灵活性和控制力,使得可以在特定数据集的子集上执行计算,而不会改变查询结果的行数。

窗口函数语法

窗口函数应用于SELECT子句,语法结构如下:

1
2
3
4
window_function (expression) OVER (
[ PARTITION BY part_list ]
[ ORDER BY order_list ]
[ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] )

其中各项含义如下:

  • partition by:将数据按给出的字段进行数据分区。不指定这一部分时,将所有数据作为同一分区。
  • order by:将各分区内的数据按给出的条件进行排序。
  • { rows | range } between frame start and frame_end:基于分区内的数据为每一行生成一个frame,大部分窗口函数将在frame上执行。不指定这一部分时,根据是否有order by部分,采用不同的默认值。没有order by部分时,相当于rows between unbounded preceding and unbounded following,取得分区内的所有值。有order by部分时,相当于rows between unbounded preceding and current row,取得分区内第一行数据到当前行的范围。rows和range两种范围分别表示基于行来筛选frame、基于值的范围来筛选frame。
  • window_function:对frame内的数据进行处理,执行聚合、取值、排序等功能,生成数据附加到当前行上。但row_number()、rank()等部分窗口函数只会应用在分区上。

在SQL的逻辑执行顺序上,其位于SELECT之后,ORDER BY之前。

SQL语句执行顺序

可用的窗口函数

支持的聚合函数:SUM();COUNT();AVG();MIN();MAX()。

支持的取值函数:FIRST_VALUE (expression)取第一行的值;LAST_VALUE (expression)取最后一行的值;LAG (expression [, offset [, default_value]])取当前行前offset行的值并在无法取得值时使用default_value作为默认值;LEAD (expression [, offset [, default_value]])取当前行后offset行的值并在无法取得值时使用default_value作为默认值。

支持的排序函数:CUME_DIST()计算当前行的累计分布比例,值的最终范围为(0,1];PERCENT_RANK()计算当前行的百分比排名,值的最终范围为[0,1];DENSE_RANK()计算当前行的密集排名(相同值也使用不同的排名,不支持并列);RANK()计算当前行的排名(支持并列,比如2人并列第1,下一个就是第3);NTILE(num_buckets)将数据平均分为num_buckets个桶(最后一个桶可能不够)并给出桶号;ROW_NUMBER()计算当前行在分区中的位置,从1开始。

需要注意的是,lag()、lead()、row_number()、ntile()、rank()、dense_rank()、percent_rank()、cume_dist()都直接在分区范围上工作,而不是在frame上。

窗口函数示例

在MySQL、PostgreSQL、SQL Server中都支持这些窗口函数,下面的示例在3种数据库中都能正常工作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
drop table if exists students;

CREATE TABLE students (
student_id INT,
score INT
);

INSERT INTO students (student_id, score) VALUES (1, 85);
INSERT INTO students (student_id, score) VALUES (2, 70);
INSERT INTO students (student_id, score) VALUES (3, 90);
INSERT INTO students (student_id, score) VALUES (4, 75);
INSERT INTO students (student_id, score) VALUES (5, 95);
INSERT INTO students (student_id, score) VALUES (6, 82);
INSERT INTO students (student_id, score) VALUES (7, 88);
INSERT INTO students (student_id, score) VALUES (8, 78);

SELECT
student_id,
score,
sum(score) OVER (ORDER BY score DESC rows between 1 preceding and 1 following ) AS r_sum,
count(*) over (order by score desc rows between 1 preceding and 1 following ) AS r_count,
avg(score) over (order by score desc rows between 1 preceding and 1 following ) AS r_avg,
min(score) over (order by score desc rows between 1 preceding and 1 following ) AS r_min,
max(score) over (order by score desc rows between 1 preceding and 1 following ) AS r_max,
first_value(score) over (order by score desc rows between 1 preceding and 1 following ) AS r_first_value,
last_value(score) over (order by score desc rows between 1 preceding and 1 following ) AS r_last_value,
lag(score, 1, -1) over (order by score desc ) AS r_lag,
lead(score, 1, -1) over (order by score desc ) AS r_lead,
row_number() over (order by score desc ) AS r_row_number,
ntile(8) over (order by score desc ) AS r_ntile,
rank() over (order by score desc ) AS r_rank,
dense_rank() over (order by score desc ) AS r_dense_rank,
percent_rank() over (order by score desc ) AS r_percent_rank,
cume_dist() over (order by score desc ) AS r_cume_dist
FROM
students
ORDER BY
student_id;

执行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
+----------+-----+-----+-------+-------+-----+-----+-------------+------------+-----+------+------------+-------+------+------------+-------------------+-----------+
|student_id|score|r_sum|r_count|r_avg |r_min|r_max|r_first_value|r_last_value|r_lag|r_lead|r_row_number|r_ntile|r_rank|r_dense_rank|r_percent_rank |r_cume_dist|
+----------+-----+-----+-------+-------+-----+-----+-------------+------------+-----+------+------------+-------+------+------------+-------------------+-----------+
|1 |85 |255 |3 |85.0000|82 |88 |88 |82 |88 |82 |4 |4 |4 |4 |0.42857142857142855|0.5 |
|2 |70 |145 |2 |72.5000|70 |75 |75 |70 |75 |-1 |8 |8 |8 |8 |1 |1 |
|3 |90 |273 |3 |91.0000|88 |95 |95 |88 |95 |88 |2 |2 |2 |2 |0.14285714285714285|0.25 |
|4 |75 |223 |3 |74.3333|70 |78 |78 |70 |78 |70 |7 |7 |7 |7 |0.8571428571428571 |0.875 |
|5 |95 |185 |2 |92.5000|90 |95 |95 |90 |-1 |90 |1 |1 |1 |1 |0 |0.125 |
|6 |82 |245 |3 |81.6667|78 |85 |85 |78 |85 |78 |5 |5 |5 |5 |0.5714285714285714 |0.625 |
|7 |88 |263 |3 |87.6667|85 |90 |90 |85 |90 |85 |3 |3 |3 |3 |0.2857142857142857 |0.375 |
|8 |78 |235 |3 |78.3333|75 |82 |82 |75 |82 |75 |6 |6 |6 |6 |0.7142857142857143 |0.75 |
+----------+-----+-----+-------+-------+-----+-----+-------------+------------+-----+------+------------+-------+------+------------+-------------------+-----------+

参考资料