MYSQL 分区那些事儿

Innodb和NDB存储引擎支持分区,MYSQL分区是一种水平分区方式,指的是将同一表中的不同行的记录分配到不同的物理文件中,几个分区就有几个.idb文件(针对innodb),各自存放了索引(INDEX DIRECTORY)和数据(DATA DIRECTORY),为局部分区索引。分区操作会记录在binlog中,会以区为单位进行记录,所以一次操作可能触及3个区,有3行记录。

它主要存在两个好处:

  1. 更快地删除分区数据。一个典型的用法是,只关心最近的数据,不关心很久以前的数据,可以用DATE做RANGE分区,不时地rotate partition(新增分区 & 删除旧分区)。
  2. 查询数据时,要么通过optimizer自己做的partition pruning,要么通过where语句带分区(select ... from table partition (p0)),mysql只会搜寻对应的区而不是全局。

与分库分表相比,它是mysql自己支持的功能,不需要改动业务代码。当使用不当时它也可能造成性能瓶颈,由于局部索引而非全局,当select中不带分区键时,需要在每个区中各自通过索引拿到结果,比不使用分区更慢,同时更容易出现表锁问题,其他局限性请看manual

要找到一个表的分区情况,可以用以下命令:

SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA = 'X' AND TABLE_NAME LIKE 'XX';

使用分区以后,如果插入的值不值分区的规定范围内,会出现insert failure,这时候可以用MAXVALUE来定义一个最大分区,如:

CREATE TABLE t2 (
    fname VARCHAR(50) NOT NULL,
    lname VARCHAR(50) NOT NULL,
    region_code TINYINT UNSIGNED NOT NULL,
    dob DATE NOT NULL PRIMARY KEY
)
PARTITION BY RANGE( YEAR(dob) ) (
    PARTITION d0 VALUES LESS THAN (1970),
    PARTITION d1 VALUES LESS THAN (1975),
    PARTITION d2 VALUES LESS THAN (1980),
    PARTITION d7 VALUES LESS THAN MAXVALUE
);

具体的分区类型介绍可看这篇文章

分区限制

分区对于唯一键和主键有所要求:如果表含有唯一键(包括主键),分区所用的key包含的字段必须是每一个唯一键的子集。比如:

# 无法创建,因为分区的key(col1, col3)不是col1以及col3的子集
CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1),
    UNIQUE KEY (col3)
)
PARTITION BY HASH(col1 + col3)
PARTITIONS 4;

# 无法创建,因为分区的key(col1, col3)不是col3的子集
CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col3),
    UNIQUE KEY (col3)
)
PARTITION BY HASH(col1 + col3)
PARTITIONS 4;

# 这个表不能创建分区
CREATE TABLE t4 (
    col1 INT NOT NULL,
    col2 INT NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col3),
    UNIQUE KEY (col2, col4)
);

# 可以创建
CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col3)
)
PARTITION BY HASH(col1 + col3)
PARTITIONS 4;

以上规则也适用于当要用ALTER TABLE来建立/改变分区。

Partition Management

ALTER TABLE语句来改变已有分区表的分区结构。比如:

CREATE TABLE trb3 (id INT, name VARCHAR(50), purchased DATE)
    PARTITION BY RANGE( YEAR(purchased) ) (
        PARTITION p0 VALUES LESS THAN (1990),
        PARTITION p1 VALUES LESS THAN (1995),
        PARTITION p2 VALUES LESS THAN (2000),
        PARTITION p3 VALUES LESS THAN (2005)
    );
    
# repartition. essentailly drop and recreate a partitioning table
ALTER TABLE trb3 PARTITION BY KEY(id) PARTITIONS 2;

# remove all partitions and change the table definition without otherwise affecting the table or its data
ALTER TABLE trb3 REMOVE PARTITIONING;

# remove a partition(data + partition)
ALTER TABLE trb3 DROP PARTITION p0;

# add partition
ALTER TABLE t1 ADD PARTITION (PARTITION p3 VALUES LESS THAN (2002));

# delete all rows(actual data) from one or more selected partitions, remain actual partition in place but empty
ALTER TABLE ... TRUNCATE PARTITION 

# drop all data from all partitions while preserving the table definition and its partitioning scheme
TRUNCATE TABLE

# split p0, involves copying data
ALTER TABLE members REORGANIZE PARTITION trb3 INTO (
    PARTITION s0 VALUES LESS THAN (1980),
    PARTITION s1 VALUES LESS THAN (1990)
);

# merge adjacent(and only adjacent) partitions, involves copying data
ALTER TABLE members REORGANIZE PARTITION s0,s1 INTO (
    PARTITION p0 VALUES LESS THAN (1970)
);

# dropping all records stored in the partition, then reinserting them, defragmentation
ALTER TABLE trb3 REBUILD PARTITION p0, p1;

# reads and stores the key distributions for partitions.
ALTER TABLE trb3 ANALYZE PARTITION p0, p1;

# whether the data or indexes in partition p1 of table t1 are corrupted. If this is the case, use ALTER TABLE ... REPAIR PARTITION to repair the partition. Normally, CHECK PARTITION fails when the partition contains duplicate key errors
ALTER TABLE trb3 CHECK PARTITION p1;

# repairs corrupted partitions
ALTER TABLE t1 REPAIR PARTITION p0,p1;

add partition的时候,只能增加比当前分区数值高的分区,也就是只能append分区,可以通过reorganize的方式来“插入”分区。

对于分区操作,有以下要注意的几个地方

  1. 在单个ALTER TABLE命令中,PARTITION BY或者REMOVE PARTITIONING可以与其他的DDL一起出现,但是它们必须在语句的最后。
  2. 在单个ALTER TABLE命令中,ADD PARTITION, DROP PARTITION, COALESCE PARTITION, REORGANIZE PARTITION, ANALYZE PARTITION, CHECK PARTITIONREPAIR PARTITION不能与其他的DDL一起出现,因为这些命令只会对单个、几个的分区进行操作。
  3. 在单个ALTER TABLE命令中,以下命令只能出现一个PARTITION BY, ADD PARTITION, DROP PARTITION, TRUNCATE PARTITION, EXCHANGE PARTITION, REORGANIZE PARTITION, or COALESCE PARTITION, ANALYZE PARTITION, CHECK PARTITION, OPTIMIZE PARTITION, REBUILD PARTITION, REMOVE PARTITIONING.

NULL存在哪个分区

对于RANGE分区来说,NULL会被看做是最小的值,存在最小的分区里。

对于LIST分区来说,NULL只能存在有NULL定义list的分区里。

对于HASH和KEY分区来说,NULL的哈希值是0。

References