sql优化

写在前面

iospeed

磁盘IO主要的延时是由(以15000rpm硬盘为例):
机械转动延时(机械磁盘的主要性能瓶颈,平均为2ms) + 寻址延时(2~3ms) + 块传输延时(一般4k每块,40m/s的传输速度,延时一般为0.1ms)
决定。(平均为5ms)

网络IO服务器响应延时 + 带宽限制 + 网络延时 + 跳转路由延时 + 本地接收延时
决定。(一般为几十到几千毫秒,受环境干扰极大)

BI的架构

  1. 网页编辑器(编写sql语句、python代码)
  2. bi服务器,用于运行bi网站,处理bi请求
  3. bi将请求发到对应的数据库服务上处理,得到结果后再发往用户浏览器,最后在网页上展示出结果数据

SQL

SQL 是用于访问和处理数据库的标准的计算机语言。

一般的sql语句,增删改查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| nextcloud |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

mysql> CREATE DATABASE wang;
Query OK, 1 row affected (0.00 sec)

mysql> drop DATABASE wang;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into oc_accounts values("wang","hello world.");
Query OK, 1 row affected (0.00 sec)

mysql> select * from oc_accounts;
+------+--------------+
| uid | data |
+------+--------------+
| wang | hello world. |
+------+--------------+
1 row in set (0.00 sec)

mysql> update oc_accounts set data="fareware" where uid="wang";
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> delete from oc_accounts where uid="wang";
Query OK, 1 row affected (0.00 sec)

进阶

笛卡尔积

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
两张表

mysql> select * from runoob_tbl ;
+-----------+---------------+---------------+-----------------+
| runoob_id | runoob_title | runoob_author | submission_date |
+-----------+---------------+---------------+-----------------+
| 1 | 学习 PHP | 菜鸟教程 | 2017-04-12 |
| 2 | 学习 MySQL | 菜鸟教程 | 2017-04-12 |
| 3 | 学习 Java | RUNOOB.COM | 2015-05-01 |
| 4 | 学习 Python | RUNOOB.COM | 2016-03-06 |
| 5 | 学习 C | FK | 2017-04-05 |
+-----------+---------------+---------------+-----------------+
5 rows in set (0.00 sec)

mysql> select * from tcount_tbl;
+---------------+--------------+
| runoob_author | runoob_count |
+---------------+--------------+
| 菜鸟教程 | 10 |
| RUNOOB.COM | 20 |
| Google | 22 |
+---------------+--------------+
3 rows in set (0.00 sec)

联表查询 INNER JOIN(内连接,或等值连接):获取两个表中字段匹配关系的记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
mysql> SELECT a.*, b.* FROM runoob_tbl a INNER JOIN tcount_tbl b;
+-----------+---------------+---------------+-----------------+---------------+--------------+
| runoob_id | runoob_title | runoob_author | submission_date | runoob_author | runoob_count |
+-----------+---------------+---------------+-----------------+---------------+--------------+
| 1 | 学习 PHP | 菜鸟教程 | 2017-04-12 | 菜鸟教程 | 10 |
| 1 | 学习 PHP | 菜鸟教程 | 2017-04-12 | RUNOOB.COM | 20 |
| 1 | 学习 PHP | 菜鸟教程 | 2017-04-12 | Google | 22 |
| 2 | 学习 MySQL | 菜鸟教程 | 2017-04-12 | 菜鸟教程 | 10 |
| 2 | 学习 MySQL | 菜鸟教程 | 2017-04-12 | RUNOOB.COM | 20 |
| 2 | 学习 MySQL | 菜鸟教程 | 2017-04-12 | Google | 22 |
| 3 | 学习 Java | RUNOOB.COM | 2015-05-01 | 菜鸟教程 | 10 |
| 3 | 学习 Java | RUNOOB.COM | 2015-05-01 | RUNOOB.COM | 20 |
| 3 | 学习 Java | RUNOOB.COM | 2015-05-01 | Google | 22 |
| 4 | 学习 Python | RUNOOB.COM | 2016-03-06 | 菜鸟教程 | 10 |
| 4 | 学习 Python | RUNOOB.COM | 2016-03-06 | RUNOOB.COM | 20 |
| 4 | 学习 Python | RUNOOB.COM | 2016-03-06 | Google | 22 |
| 5 | 学习 C | FK | 2017-04-05 | 菜鸟教程 | 10 |
| 5 | 学习 C | FK | 2017-04-05 | RUNOOB.COM | 20 |
| 5 | 学习 C | FK | 2017-04-05 | Google | 22 |
+-----------+---------------+---------------+-----------------+---------------+--------------+
15 rows in set (0.00 sec)

mysql> SELECT a.runoob_id, a.runoob_author, b.runoob_count FROM runoob_tbl a INNER JOIN tcount_tbl b ON a.runoob_author = b.runoob_author;
+-----------+---------------+--------------+
| runoob_id | runoob_author | runoob_count |
+-----------+---------------+--------------+
| 1 | 菜鸟教程 | 10 |
| 2 | 菜鸟教程 | 10 |
| 3 | RUNOOB.COM | 20 |
| 4 | RUNOOB.COM | 20 |
+-----------+---------------+--------------+
4 rows in set (0.00 sec)

联表查询 LEFT JOIN(左连接):获取左表所有记录,即使右表没有对应匹配的记录。

1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT a.runoob_id, a.runoob_author, b.runoob_count FROM runoob_tbl a LEFT JOIN tcount_tbl b ON a.runoob_author = b.runoob_author;
+-----------+---------------+--------------+
| runoob_id | runoob_author | runoob_count |
+-----------+---------------+--------------+
| 1 | 菜鸟教程 | 10 |
| 2 | 菜鸟教程 | 10 |
| 3 | RUNOOB.COM | 20 |
| 4 | RUNOOB.COM | 20 |
| 5 | FK | NULL |
+-----------+---------------+--------------+
5 rows in set (0.00 sec)

联表查询 RIGHT JOIN(右连接):用于获取右表所有记录,即使左表没有对应匹配的记录。

1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT a.runoob_id, a.runoob_author, b.runoob_count FROM runoob_tbl a RIGHT JOIN tcount_tbl b ON a.runoob_author = b.runoob_author;
+-----------+---------------+--------------+
| runoob_id | runoob_author | runoob_count |
+-----------+---------------+--------------+
| 1 | 菜鸟教程 | 10 |
| 2 | 菜鸟教程 | 10 |
| 3 | RUNOOB.COM | 20 |
| 4 | RUNOOB.COM | 20 |
| NULL | NULL | 22 |
+-----------+---------------+--------------+
5 rows in set (0.00 sec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
需求:跑得快每日对局中各个渠道的用户对局数
SELECT
b.channel,
count(a.ID) AS games
FROM
table_2021_10_12 a
LEFT JOIN (
SELECT
user_id,
channel
FROM
login_2021_10_12
GROUP BY
user_id,
channel) b ON a.user_id = b.user_id
GROUP BY
b.channel

联表总结

  • inner join:两个表都有的数据
  • left join:右表有的数据
  • right join:右表有的数据

with语句

在Sql中,with语句指定临时命名结果集,称为公用表表达式(CTE)。它可以用于递归查询,在这种情况下,我们称之为子集或子查询。

WITH
  cte1 AS (SELECT a, b FROM table1),
  cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;

一些有用的mysql、presto内置函数

  1. CASE
1
2
3
4
5
6
SELECT a,
CASE a
WHEN 1 THEN 'one'
WHEN 2 THEN 'two'
ELSE 'many'
END
  1. IF
1
if(condition, true_value, false_value)
  1. count
1
Returns the number of non-null input values.
  1. count_if
1
2
count_if(expression) → bigint
Returns the number of TRUE input values. This function is equivalent to count(CASE WHEN x THEN 1 END).
  1. sum
1
2
sum(x) → [same as input]
Returns the sum of all input values.
  1. cast
1
2
3
Explicitly cast a value as a type. This can be used to cast a varchar to a numeric value type and vice versa.

CAST('[1,2,3]' AS ARRAY(INTEGER))
  1. split、split_part、substr
1
2
3
4
5
6
7
8
split(string, delimiter) -> array(varchar)

split_part(string, delimiter, index) → varchar

substr(string, start) → varchar

substr(string, start, length) → varchar

  1. json_extract
1
2
3
4
5
json_extract(json, json_path) → json

Evaluates the JSONPath-like expression json_path on json (a string containing JSON) and returns the result as a JSON string

SELECT json_extract(json, '$.store.book');
  1. array_position
1
2
3
array_position(x, element) → bigint

Returns the position of the first occurrence of the element in array x (or 0 if not found).
  1. regexp_replace
1
2
3
4
5
regexp_replace(string, pattern) → varchar

Removes every instance of the substring matched by the regular expression pattern from string:

SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
  1. array_sort
1
2
3
array_sort(array(T), function(T, T, int)) -> array(T)

SELECT array_sort(ARRAY [3, 2, 5, 1, 2], (x, y) -> IF(x < y, 1, IF(x = y, 0, -1))); -- [5, 3, 2, 2, 1]
  1. transform
1
2
3
4
5
transform(array(T), function(T, U)) -> array(U)

Returns an array that is the result of applying function to each element of array

SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]

in和exist

exist

1
2
3
4
5
6
7
8
9
10
11
12
13
EXISTS用于检查子查询是否至少会返回一行数据,该子查询实际上并不返回任何数据,而是返回值True或False
EXISTS 指定一个子查询,检测 行 的存在。

先执行外部查询语句,然后在执行子查询,子查询中它每次都会去执行数据库的查询,执行次数等于外查询的数据数量

两种情况
1.表a中100000条数据,表b中100条数据,查询数据库次数=1(表a查一次)+100000(子查询:查询表b的次数),一共100001次
2.表a中100条数据,表b100000条,查询数据库次数=1(表a查一次)+100(子查询次数),一共 101次

结论
exist适合子查询中表数据大于外查询表中数据的业务场景

select * from TableIn where exists(select BID from TableEx where BNAME=TableIn.ANAME)

in

1
2
3
先查询 in()子查询的数据(1次),并且将数据放进内存里(不需要多次查询),然后外部查询的表再根据查询的结果进行查询过滤,最后返回结果

select * from TableIn where ANAME in(select BNAME from TableEx)

示例展示

跑得快中每天抢关、明牌的人数、次数

数据源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"81803533":{
"show":0,
"grab":0,
"double":1,
"newshow":1,
"win":0
},
"110504472":{
"show":0,
"grab":0,
"double":1,
"newshow":1,
"win":0
},
"127409011":{
"show":0,
"grab":0,
"double":1,
"newshow":1,
"win":1
},
"first":110504472,
"amends":0,
"bomb":0,
"close":1,
"interest":[

],
"multiple":1
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
with

c as (select array_position(CAST(json_extract(results, '$.show') AS ARRAY(INTEGER)), 3) AS showposi, array_position(CAST(json_extract(results, '$.grab') AS ARRAY(INTEGER)), 1) AS grabposi,transform(array_sort(split(regexp_replace(norder, '["{}]'), ','), (x, y) -> IF(split_part(x,':',2) < split_part(y,':',2), 1, IF(split_part(x,':',2) = split_part(y, ':', 2), 0, -1))), x-> split_part(x,':',1)) as sortonly from paodekuai."game" where date='2021-09-01' ),

d as (select showposi, grabposi, sortonly from c where showposi>0 or grabposi>0),

e as (select
case
when showposi=1 then sortonly[1]
when showposi=2 then sortonly[2]
when showposi=3 then sortonly[3]
else '0'
end
as showuser,
case
when grabposi=1 then sortonly[1]
when grabposi=2 then sortonly[2]
when grabposi=3 then sortonly[3]
else '0'
end
as grabuser
from d ),
f as ( select count(distinct showuser) as shownum, count_if(showuser > '0') as showcnt from e),
g as ( select count(distinct grabuser) as grabnum, count_if(grabuser > '0') as grabcnt from e)

select f.*, g.* from f, g