PG的9.5+高效分区表实现-pg

PostgreSQL 9.5+ 高效分区表实现 - pg_pathman

作者

digoal

日期

2016-10-24

标签

PostgreSQL , 分区表 , pg_pathman , custom scan api

背景

目前PostgreSQL社区版本的分区表功能比较弱，需要通过继承和触发器或RULE来实现分区表的功能，由于查询和更新涉及约束的检查、插入则涉及触发器或规则重写，导致分区功能性能较差。

商业版本EDB，以及数据仓库Greenplum都有比较好的分区支持。

去年GP开源后，阿里云RDS PostgreSQL的小伙伴将Greenplum的分区表功能port到PostgreSQL 9.4之后，比使用继承和触发器的方式性能提升了近百倍(参考我之前写的文章，传统方法除了粗发去本身的开销，还有SEARCH的开销，分区越多越慢，没有使用binary search)，由于需要改动CATALOG，所以一直没有在9.4的版本上线这个功能。

分区表已经成为PostgreSQL用户万众期待的功能之一啦。

社区核心成员之一oleg所在的公司postgrespro，开发了一款分区表功能的插件，不需要动用catalog，可以很方便的增加分区表的功能。

本文将为大家讲解pg_pathman的原理，以及使用方法。

一、pg_pathman 原理

PostgreSQL传统的分区方法，使用约束来区分不同分区存放的数据(配置constraint_exclusion=partition)，执行select/delete/update时执行计划根据约束和查询条件排除不需要查询的分区表。

调用COPY或插入数据时使用触发器或规则，将数据插入对应的分区表。

传统的做法，无论是查询还是插入，对性能的影响都较大。

pg_pathman与传统的继承分区表做法有一个不同的地方，分区的定义存放在一张元数据表中，表的信息会cache在内存中，同时使用HOOK来实现RELATION的替换，所以效率非常高。

目前支持两种分区模式，range和hash，其中range使用binary search查找对应的分区，hash使用hash search查找对应的分区。

pg_pathman 用到的hook如下

1. pg_pathman uses ProcessUtility_hook hook to handle COPY queries for partitioned tables.

2. RuntimeAppend (overrides Append plan node)

3. RuntimeMergeAppend (overrides MergeAppend plan node)

4. PartitionFilter (drop-in replacement for INSERT triggers)

二、pg_pathman 特性

1. 目前支持range , hash分区。

HASH and RANGE partitioning schemes;

2. 支持自动分区管理（通过函数接口创建分区，自动将主表数据迁移到分区表），或手工分区管理（通过函数实现，将已有的表绑定到分区表，或者从分区表剥离）。

Both automatic and manual partition management;

3. 支持的分区字段类型包括int, float, date, 以及其他常用类型，包括自定义的domain。

Support for integer, floating point, date and other types, including domains;

4. 通过CUSTOM SCAN实现了有效的分区表JOIN, 子查询过滤分区。

Effective query planning for partitioned tables (JOINs, subselects etc);

5. 使用RuntimeAppend & RuntimeMergeAppend custom plan nodes实现了动态分区选择。

RuntimeAppend & RuntimeMergeAppend custom plan nodes to pick partitions at runtime;

6. PartitionFilter HOOK，实现insert inplace，代替传统的insert trigger或insert rule。

PartitionFilter: an efficient drop-in replacement for INSERT triggers;

7. 支持自动新增分区。目前仅支持range分区表。

Automatic partition creation for new INSERTed data (only for RANGE partitioning);

8. 支持copy from/to 直接读取或写入分区表，提高效率。

Improved COPY FROM\TO statement that is able to insert rows directly into partitions;

9. 支持分区字段的更新，需要添加触发器，如果不需要更新分区字段，则不建议添加这个触发器，会产生一定的性能影响。

UPDATE triggers generation out of the box (will be replaced with custom nodes too);

10. 允许用户自定义回调函数，在创建分区时会自动触发。

User-defined callbacks for partition creation event handling;

回调函数的规则如下

$part_init_callback$(args JSONB) RETURNS VOID

传入参数如下

/* RANGE-partitioned table abc (for exp: child abc_4) */

{

"parent": "abc",

"parttype": "2",

"partition": "abc_4",

"range_max": "401",

"range_min": "301"

}

/* HASH-partitioned table abc (for exp: child abc_0) */

{

"parent": "abc",

"parttype": "1",

"partition": "abc_0"

}

11. 非堵塞式创建分区表，以及后台自动将主表数据迁移到分区表，非堵塞式。

Non-blocking concurrent table partitioning;

12. 支持FDW，通过配置参数(disabled | postgres | any_fdw)支持postgres_fdw或任意fdw

FDW support (foreign partitions);

13. 支持GUC参数配置，注意由于使用了HOOK，如果其他插件也使用了相同的HOOK，需要将pg_pathman放在前面注册，如pg_stat_statements。

shared_preload_libraries = 'pg_pathman, pg_stat_statements'

Various GUC toggles and configurable settings.

三、pg_pathman 为什么高效

插入优化，使用PartitionFilter替换relation，代替触发器的方式。效率提升非常明显。

查询优化，分区定义加载在内存中，使用binary search和hash search对应range与hash分区表，使用RuntimeAppend & RuntimeMergeAppend custom plan nodes to pick partitions at runtime;

相比查询时通过约束过滤更加高效。同时runtime过滤，支持子查询。传统的约束法不支持子查询过滤。

四、pg_pathman 使用方法

pg_pathman使用了custom scan provider api，所以只支持PostgreSQL 9.5以及以上的版本。

四.1 安装、配置

$ git clone

$ export PATH=/home/digoal:$PATH

$ cd pg_pathman

$ make USE_PGXS=1

$ make USE_PGXS=1 install

$ cd $PGDATA

$ vi

shared_preload_libraries = 'pg_pathman,pg_stat_statements'

$ pg_ctl restart -m fast

$ psql

postgres=# create extension pg_pathman;

CREATE EXTENSION

postgres=# \dx

List of installed extensions

Name | Version | Schema | Description

------------+---------+------------+------------------------------

pg_pathman | 1.1 | public | Partitioning tool ver. 1.1

四.2 参数

--- disable (or enable) pg_pathman completely

默认on

_runtimeappend --- toggle RuntimeAppend custom node on\off

默认on

_runtimemergeappend --- toggle RuntimeMergeAppend custom node on\off

默认on

_partitionfilter --- toggle PartitionFilter custom node on\off

默认on

_auto_partition --- toggle automatic partition creation on\off (per session)

默认on

--- allow INSERTs into various FDWs (disabled | postgres | any_fdw)

默认postgres

--- toggle COPY statement hooking on\off

默认on

四.3 相关视图和表

pg_pathman 使用函数来维护分区表，并且创建了一些视图，可以查看分区表的状态。

分区表的定义则存在一张表中，定义数据缓存在内存中。

1. pathman_config --- main config storage

This table stores a list of partitioned tables.

CREATE TABLE IF NOT EXISTS pathman_config (

partrel REGCLASS NOT NULL PRIMARY KEY, -- 主表OID

attname TEXT NOT NULL, -- 分区列名

parttype INTEGER NOT NULL, -- 分区类型(hash or range)

range_interval TEXT, -- range分区的interval

CHECK (parttype IN (1, 2)) /* check for allowed part types */ );

2. pathman_config_params --- optional parameters

This table stores optional parameters which override standard behavior.

这张表存储的信息将覆盖标准配置(即中的配置)

CREATE TABLE IF NOT EXISTS pathman_config_params (

partrel REGCLASS NOT NULL PRIMARY KEY, -- 主表oid

enable_parent BOOLEAN NOT NULL DEFAULT TRUE, -- 是否在优化器中过滤主表

auto BOOLEAN NOT NULL DEFAULT TRUE, -- insert时是否自动扩展不存在的分区

init_callback REGPROCEDURE NOT NULL DEFAULT 0); -- create partition时的回调函数oid

3. pathman_concurrent_part_tasks --- currently running partitioning workers

This view lists all currently running concurrent partitioning tasks.

当前正在执行的数据迁移任务（从主表将数据迁移到分区）。

-- helper SRF function

CREATE OR REPLACE FUNCTION show_concurrent_part_tasks()

RETURNS TABLE (

userid REGROLE,

pid INT,

dbid OID,

relid REGCLASS,

processed INT,

status TEXT)

AS 'pg_pathman', 'show_concurrent_part_tasks_internal'

LANGUAGE C STRICT;

CREATE OR REPLACE VIEW pathman_concurrent_part_tasks

AS SELECT * FROM show_concurrent_part_tasks();

4. pathman_partition_list --- list of all existing partitions

This view lists all existing partitions, as well as their parents and range boundaries (NULL for HASH partitions).

列出已经存在的分区。

-- helper SRF function

CREATE OR REPLACE FUNCTION show_partition_list()

RETURNS TABLE (

parent REGCLASS,

partition REGCLASS,

parttype INT4,

partattr TEXT,

range_min TEXT,

range_max TEXT)

AS 'pg_pathman', 'show_partition_list_internal'

LANGUAGE C STRICT;

CREATE OR REPLACE VIEW pathman_partition_list

AS SELECT * FROM show_partition_list();

四.4 分区表管理

创建分区表时，需要指定主表的名字，主表必须已存在，主表可以有数据，也可以是空表。

如果主表有数据，那么可以配置是否需要在创建分区时，将数据迁移到分区，（不建议对大表这么做）。

如果主表有很多数据，建议使用后台非堵塞式的迁移方法。（调用partition_table_concurrently()函数进行迁移）。

如果在创建分区表前，使用set_init_callback(relation regclass, callback regproc DEFAULT 0)设置了回调函数，则创建分区时，每个分区表创建是，会自动调用对应的回调函数。

回调函数的传入参数和模式如下

$part_init_callback$(args JSONB) RETURNS VOID

传入参数如下

/* RANGE-partitioned table abc (for exp: child abc_4) */

{

"parent": "abc",

"parttype": "2",

"partition": "abc_4",

"range_max": "401",

"range_min": "301"

}

/* HASH-partitioned table abc (for exp: child abc_0) */

{

"parent": "abc",

"parttype": "1",

"partition": "abc_0"

}

1. range分区

有4个管理函数用来创建范围分区

指定起始值、间隔、分区个数

create_range_partitions(relation REGCLASS, -- 主表OID

attribute TEXT, -- 分区列名

start_value ANYELEMENT, -- 开始值

p_interval ANYELEMENT, -- 间隔；任意类型，适合任意类型的分区表

p_count INTEGER DEFAULT NULL, -- 分多少个区

partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

create_range_partitions(relation REGCLASS, -- 主表OID

attribute TEXT, -- 分区列名

start_value ANYELEMENT, -- 开始值

p_interval INTERVAL, -- 间隔；interval 类型，用于时间分区表

p_count INTEGER DEFAULT NULL, -- 分多少个区

partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

指定起始值、终值、间隔

create_partitions_from_range(relation REGCLASS, -- 主表OID

attribute TEXT, -- 分区列名

start_value ANYELEMENT, -- 开始值

end_value ANYELEMENT, -- 结束值

p_interval ANYELEMENT, -- 间隔；任意类型，适合任意类型的分区表

partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

create_partitions_from_range(relation REGCLASS, -- 主表OID

attribute TEXT, -- 分区列名

start_value ANYELEMENT, -- 开始值

end_value ANYELEMENT, -- 结束值

p_interval INTERVAL, -- 间隔；interval 类型，用于时间分区表

partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

例子

创建需要分区的主表

postgres=# create table part_test(id int, info text, crt_time timestamp not null); -- 分区列必须有not null约束

CREATE TABLE

插入一批测试数据，模拟已经有数据了的主表

postgres=# insert into part_test select id,md5(random()::text),clock_timestamp() + (id||' hour')::interval from generate_series(1,10000) t(id);

INSERT 0 10000

postgres=# select * from part_test limit 10;

id | info | crt_time

----+----------------------------------+----------------------------

1 | 36fe1adedaa5b848caec4941f87d443a | 2016-10-25 10:27:13.206713

2 | c7d7358e196a9180efb4d0a10269c889 | 2016-10-25 11:27:13.206893

3 | 005bdb063550579333264b895df5b75e | 2016-10-25 12:27:13.206904

4 | 6c900a0fc50c6e4da1ae95447c89dd55 | 2016-10-25 13:27:13.20691

5 | 857214d8999348ed3cb0469b520dc8e5 | 2016-10-25 14:27:13.206916

6 | 4495875013e96e625afbf2698124ef5b | 2016-10-25 15:27:13.206921

7 | 82488cf7e44f87d9b879c70a9ed407d4 | 2016-10-25 16:27:13.20693

8 | a0b92547c8f17f79814dfbb12b8694a0 | 2016-10-25 17:27:13.206936

9 | 2ca09e0b85042b476fc235e75326b41b | 2016-10-25 18:27:13.206942

10 | 7eb762e1ef7dca65faf413f236dff93d | 2016-10-25 19:27:13.206947

(10 rows)

注意:

1. 分区列必须有not null约束

2. 分区个数必须能覆盖已有的所有记录

创建分区，每个分区包含1个月的跨度数据

postgres=# select

create_range_partitions('part_test'::regclass, -- 主表OID

'crt_time', -- 分区列名

'2016-10-25 00:00:00'::timestamp, -- 开始值

interval '1 month', -- 间隔；interval 类型，用于时间分区表

24, -- 分多少个区

false) ; -- 不迁移数据

NOTICE: sequence "part_test_seq" does not exist, skipping

create_range_partitions

-------------------------

(1 row)

postgres-# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_1,

part_test_10,

part_test_11,

part_test_12,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_2,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_3,

part_test_4,

part_test_5,

part_test_6,

part_test_7,

part_test_8,

part_test_9

由于不迁移数据，所以数据还在主表

postgres=# select count(*) from only part_test;

count

-------

10000

(1 row)

使用非堵塞式的迁移接口

partition_table_concurrently(relation REGCLASS, -- 主表OID

batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录

sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时，休眠多久再次获取，重试60次退出任务。

postgres=# select partition_table_concurrently('part_test'::regclass,

10000,

1.0);

NOTICE: worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');

partition_table_concurrently

------------------------------

(1 row)

迁移结束后，主表数据已经没有了，全部在分区中

postgres=# select count(*) from only part_test;

count

-------

(1 row)

数据迁移完成后，建议禁用主表，这样执行计划就不会出现主表了

postgres=# select set_enable_parent('part_test'::regclass, false);

set_enable_parent

-------------------

(1 row)

postgres=# explain select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;

QUERY PLAN

---------------------------------------------------------------------------------

Append (cost=0.00..16.18 rows=1 width=45)

-> Seq Scan on part_test_1 (cost=0.00..16.18 rows=1 width=45)

Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone)

(3 rows)

建议

1. 分区列必须有not null约束

2. 分区个数必须能覆盖已有的所有记录

3. 建议使用非堵塞式迁移接口

4. 建议数据迁移完成后，禁用主表

2. hash分区

有1个管理函数用来创建范围分区

指定起始值、间隔、分区个数

create_hash_partitions(relation REGCLASS, -- 主表OID

attribute TEXT, -- 分区列名

partitions_count INTEGER, -- 打算创建多少个分区

partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

例子

创建需要分区的主表

postgres=# create table part_test(id int, info text, crt_time timestamp not null); -- 分区列必须有not null约束

CREATE TABLE

插入一批测试数据，模拟已经有数据了的主表

postgres=# insert into part_test select id,md5(random()::text),clock_timestamp() + (id||' hour')::interval from generate_series(1,10000) t(id);

INSERT 0 10000

postgres=# select * from part_test limit 10;

id | info | crt_time

----+----------------------------------+----------------------------

1 | 29ce4edc70dbfbe78912beb7c4cc95c2 | 2016-10-25 10:47:32.873879

2 | e0990a6fb5826409667c9eb150fef386 | 2016-10-25 11:47:32.874048

3 | d25f577a01013925c203910e34470695 | 2016-10-25 12:47:32.874059

4 | 501419c3f7c218e562b324a1bebfe0ad | 2016-10-25 13:47:32.874065

5 | 5e5e22bdf110d66a5224a657955ba158 | 2016-10-25 14:47:32.87407

6 | 55d2d4fd5229a6595e0dd56e13d32be4 | 2016-10-25 15:47:32.874076

7 | 1dfb9a783af55b123c7a888afe1eb950 | 2016-10-25 16:47:32.874081

8 | 41eeb0bf395a4ab1e08691125ae74bff | 2016-10-25 17:47:32.874087

9 | 83783d69cc4f9bb41a3978fe9e13d7fa | 2016-10-25 18:47:32.874092

10 | affc9406d5b3412ae31f7d7283cda0dd | 2016-10-25 19:47:32.874097

(10 rows)

注意:

1. 分区列必须有not null约束

创建128个分区

postgres=# select

create_hash_partitions('part_test'::regclass, -- 主表OID

'crt_time', -- 分区列名

128, -- 打算创建多少个分区

false) ; -- 不迁移数据

create_hash_partitions

------------------------

128

(1 row)

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_0,

part_test_1,

part_test_10,

part_test_100,

part_test_101,

part_test_102,

part_test_103,

part_test_104,

part_test_105,

part_test_106,

part_test_107,

part_test_108,

part_test_109,

part_test_11,

part_test_110,

part_test_111,

part_test_112,

part_test_113,

part_test_114,

part_test_115,

part_test_116,

part_test_117,

part_test_118,

part_test_119,

part_test_12,

part_test_120,

part_test_121,

part_test_122,

part_test_123,

part_test_124,

part_test_125,

part_test_126,

part_test_127,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_2,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_25,

part_test_26,

part_test_27,

part_test_28,

part_test_29,

part_test_3,

part_test_30,

part_test_31,

part_test_32,

part_test_33,

part_test_34,

part_test_35,

part_test_36,

part_test_37,

part_test_38,

part_test_39,

part_test_4,

part_test_40,

part_test_41,

part_test_42,

part_test_43,

part_test_44,

part_test_45,

part_test_46,

part_test_47,

part_test_48,

part_test_49,

part_test_5,

part_test_50,

part_test_51,

part_test_52,

part_test_53,

part_test_54,

part_test_55,

part_test_56,

part_test_57,

part_test_58,

part_test_59,

part_test_6,

part_test_60,

part_test_61,

part_test_62,

part_test_63,

part_test_64,

part_test_65,

part_test_66,

part_test_67,

part_test_68,

part_test_69,

part_test_7,

part_test_70,

part_test_71,

part_test_72,

part_test_73,

part_test_74,

part_test_75,

part_test_76,

part_test_77,

part_test_78,

part_test_79,

part_test_8,

part_test_80,

part_test_81,

part_test_82,

part_test_83,

part_test_84,

part_test_85,

part_test_86,

part_test_87,

part_test_88,

part_test_89,

part_test_9,

part_test_90,

part_test_91,

part_test_92,

part_test_93,

part_test_94,

part_test_95,

part_test_96,

part_test_97,

part_test_98,

part_test_99

由于不迁移数据，所以数据还在主表

postgres=# select count(*) from only part_test;

count

-------

10000

(1 row)

使用非堵塞式的迁移接口

partition_table_concurrently(relation REGCLASS, -- 主表OID

batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录

sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时，休眠多久再次获取，重试60次退出任务。

postgres=# select partition_table_concurrently('part_test'::regclass,

10000,

1.0);

NOTICE: worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');

partition_table_concurrently

------------------------------

(1 row)

迁移结束后，主表数据已经没有了，全部在分区中

postgres=# select count(*) from only part_test;

count

-------

(1 row)

数据迁移完成后，建议禁用主表，这样执行计划就不会出现主表了

postgres=# select set_enable_parent('part_test'::regclass, false);

set_enable_parent

-------------------

(1 row)

只查单个分区

postgres=# explain select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;

QUERY PLAN

---------------------------------------------------------------------------------

Append (cost=0.00..1.91 rows=1 width=45)

-> Seq Scan on part_test_122 (cost=0.00..1.91 rows=1 width=45)

Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone)

(3 rows)

分区表约束如下

很显然pg_pathman自动完成了转换，如果是传统的继承，select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp; 这种写法是不能筛选分区的。

postgres=# \d+ part_test_122

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_122_3_check" CHECK (get_hash_part_idx(timestamp_hash(crt_time), 128) = 122)

Inherits: part_test

建议

1. 分区列必须有not null约束

2. 建议使用非堵塞式迁移接口

3. 建议数据迁移完成后，禁用主表

4. pg_pathman不会受制于表达式的写法，所以select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;这样的写法也是能走哈希分区的。

5. hash分区列不局限于int类型的列，会使用hash函数自动转换。

3. 数据迁移到分区

如果创建分区表时，未将主表数据迁移到分区，那么可以使用非堵塞式的迁移接口，将数据迁移到分区

可能类似如下做法

with tmp as (delete from 主表 limit xx nowait returning *) insert into 分区 select * from tmp

或者使用 select array_agg(ctid) from 主表 limit xx for update nowati 进行标示然后执行delete和insert。

1. 函数接口如下

partition_table_concurrently(relation REGCLASS, -- 主表OID

batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录

sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时，休眠多久再次获取，重试60次退出任务。

2. 例子

postgres=# select partition_table_concurrently('part_test'::regclass,

10000,

1.0);

NOTICE: worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');

partition_table_concurrently

------------------------------

(1 row)

3. 如何停止迁移任务，调用如下函数接口

stop_concurrent_part_task(relation REGCLASS)

4. 查看后台的数据迁移任务

postgres=# select * from pathman_concurrent_part_tasks;

--------+-----+------+-------+-----------+--------

(0 rows)

4. 分裂范围分区

例如某个分区太大了，想分裂为两个分区，可以使用这种方法

仅支持范围分区表

split_range_partition(partition REGCLASS, -- 分区oid

split_value ANYELEMENT, -- 分裂值

partition_name TEXT DEFAULT NULL) -- 分裂后新增的分区表名

例子

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_1,

part_test_10,

part_test_11,

part_test_12,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_2,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_3,

part_test_4,

part_test_5,

part_test_6,

part_test_7,

part_test_8,

part_test_9

postgres=# \d+ part_test_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)

Inherits: part_test

分裂

postgres=# select split_range_partition('part_test_1'::regclass, -- 分区oid

'2016-11-10 00:00:00'::timestamp, -- 分裂值

'part_test_1_2'); -- 分区表名

split_range_partition

-----------------------------------------------

{"2016-10-25 00:00:00","2016-11-25 00:00:00"}

(1 row)

分裂后的两个表如下

postgres=# \d+ part_test_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-10 00:00:00'::timestamp without time zone)

Inherits: part_test

postgres=# \d+ part_test_1_2

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_1_2_3_check" CHECK (crt_time >= '2016-11-10 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)

Inherits: part_test

数据会自动迁移到另一个分区

postgres=# select count(*) from part_test_1;

count

-------

373

(1 row)

postgres=# select count(*) from part_test_1_2;

count

-------

360

(1 row)

继承关系如下

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_1,

part_test_10,

part_test_11,

part_test_12,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_1_2, -- 新增的表

part_test_2,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_3,

part_test_4,

part_test_5,

part_test_6,

part_test_7,

part_test_8,

part_test_9

5. 合并范围分区

目前仅支持范围分区

调用如下接口

指定两个需要合并分区，必须为相邻分区

merge_range_partitions(partition1 REGCLASS, partition2 REGCLASS)

例子

postgres=# select merge_range_partitions('part_test_2'::regclass, 'part_test_12'::regclass) ;

ERROR: merge failed, partitions must be adjacent

CONTEXT: PL/pgSQL function merge_range_partitions_internal(regclass,regclass,regclass,anyelement) line 27 at RAISE

SQL statement "SELECT ($1, $2, $3, NULL::timestamp without time zone)"

PL/pgSQL function merge_range_partitions(regclass,regclass) line 44 at EXECUTE

不是相邻分区，报错

相邻分区可以合并

postgres=# select merge_range_partitions('part_test_1'::regclass, 'part_test_1_2'::regclass) ;

merge_range_partitions

------------------------

(1 row)

合并后，会删掉其中一个分区表

postgres=# \d part_test_1_2

Did not find any relation named "part_test_1_2".

postgres=# \d part_test_1

Table ";

Column | Type | Modifiers

----------+-----------------------------+-----------

id | integer |

info | text |

crt_time | timestamp without time zone | not null

Check constraints:

"pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)

Inherits: part_test

postgres=# select count(*) from part_test_1;

count

-------

733

(1 row)

6. 向后添加范围分区

如果已经对主表进行了分区，将来需要增加分区的话，有几种方法，一种是向后新增分区（即在末尾追加分区）。

新增分区时，会使用初次创建该分区表时的interval作为间隔。

可以在这个表中查询每个分区表初次创建时的 interval

postgres=# select * from pathman_config;

partrel | attname | parttype | range_interval

-----------+----------+----------+----------------

part_test | crt_time | 2 | 1 mon

(1 row)

添加分区接口，支持指定表空间

append_range_partition(parent REGCLASS, -- 主表OID

partition_name TEXT DEFAULT NULL, -- 新增的分区表名, 默认不需要输入

tablespace TEXT DEFAULT NULL) -- 新增的分区表放到哪个表空间, 默认不需要输入

例子

postgres=# select append_range_partition('part_test'::regclass);

append_range_partition

------------------------

(1 row)

postgres=# \d+ part_test_25

Table ""

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_25_3_check" CHECK (crt_time >= '2018-10-25 00:00:00'::timestamp without time zone AND crt_time < '2018-11-25 00:00:00'::timestamp without time zone)

Inherits: part_test

postgres=# \d+ part_test_24

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_24_3_check" CHECK (crt_time >= '2018-09-25 00:00:00'::timestamp without time zone AND crt_time < '2018-10-25 00:00:00'::timestamp without time zone)

Inherits: part_test

7. 向前添加范围分区

在头部追加分区。

接口

prepend_range_partition(parent REGCLASS,

partition_name TEXT DEFAULT NULL,

tablespace TEXT DEFAULT NULL)

例子

postgres=# select prepend_range_partition('part_test'::regclass);

prepend_range_partition

-------------------------

(1 row)

postgres=# \d+ part_test_26

Table ""

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_26_3_check" CHECK (crt_time >= '2016-09-25 00:00:00'::timestamp without time zone AND crt_time < '2016-10-25 00:00:00'::timestamp without time zone)

Inherits: part_test

postgres=# \d+ part_test_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)

Inherits: part_test

8. 添加分区

指定分区起始值的方式添加分区，只要创建的分区和已有分区不会存在数据交叉就可以创建成功。

也就是说使用这种方法，不要求强制创建连续的分区，例如已有分区覆盖了2010-2015的范围，你可以直接创建一个2020年的分区表，不需要覆盖2015到2020的范围。

接口如下

add_range_partition(relation REGCLASS, -- 主表OID

start_value ANYELEMENT, -- 起始值

end_value ANYELEMENT, -- 结束值

partition_name TEXT DEFAULT NULL, -- 分区名

tablespace TEXT DEFAULT NULL) -- 分区创建在哪个表空间下

例子

postgres=# select add_range_partition('part_test'::regclass, -- 主表OID

'2020-01-01 00:00:00'::timestamp, -- 起始值

'2020-02-01 00:00:00'::timestamp); -- 结束值

add_range_partition

---------------------

(1 row)

postgres=# \d+ part_test_27

Table ""

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_27_3_check" CHECK (crt_time >= '2020-01-01 00:00:00'::timestamp without time zone AND crt_time < '2020-02-01 00:00:00'::timestamp without time zone)

Inherits: part_test

9. 删除分区

1. 删除单个范围分区

接口如下

drop_range_partition(partition TEXT, -- 分区名称

delete_data BOOLEAN DEFAULT TRUE) -- 是否删除分区数据，如果false，表示分区数据迁移到主表。

Drop RANGE partition and all of its data if delete_data is true.

例子

删除分区，数据迁移到主表

postgres=# select drop_range_partition('part_test_1',false);

NOTICE: 733 rows copied from part_test_1

drop_range_partition

----------------------

part_test_1

(1 row)

postgres=# select drop_range_partition('part_test_2',false);

NOTICE: 720 rows copied from part_test_2

drop_range_partition

----------------------

part_test_2

(1 row)

postgres=# select count(*) from part_test;

count

-------

10000

(1 row)

删除分区，分区数据也删除，不迁移到主表

postgres=# select drop_range_partition('part_test_3',true);

drop_range_partition

----------------------

part_test_3

(1 row)

postgres=# select count(*) from part_test;

count

-------

9256

(1 row)

postgres=# select count(*) from only part_test;

count

-------

1453

(1 row)

2. 删除所有分区，并且指定是否要将数据迁移到主表

接口如下

drop_partitions(parent REGCLASS,

delete_data BOOLEAN DEFAULT FALSE)

Drop partitions of the parent table (both foreign and local relations).

If delete_data is false, the data is copied to the parent table first.

Default is false.

例子

postgres=# select drop_partitions('part_test'::regclass, false); -- 删除所有分区表，并将数据迁移到主表

NOTICE: function () does not exist, skipping

NOTICE: 744 rows copied from part_test_4

NOTICE: 672 rows copied from part_test_5

NOTICE: 744 rows copied from part_test_6

NOTICE: 720 rows copied from part_test_7

NOTICE: 744 rows copied from part_test_8

NOTICE: 720 rows copied from part_test_9

NOTICE: 744 rows copied from part_test_10

NOTICE: 744 rows copied from part_test_11

NOTICE: 720 rows copied from part_test_12

NOTICE: 744 rows copied from part_test_13

NOTICE: 507 rows copied from part_test_14

NOTICE: 0 rows copied from part_test_15

NOTICE: 0 rows copied from part_test_16

NOTICE: 0 rows copied from part_test_17

NOTICE: 0 rows copied from part_test_18

NOTICE: 0 rows copied from part_test_19

NOTICE: 0 rows copied from part_test_20

NOTICE: 0 rows copied from part_test_21

NOTICE: 0 rows copied from part_test_22

NOTICE: 0 rows copied from part_test_23

NOTICE: 0 rows copied from part_test_24

NOTICE: 0 rows copied from part_test_25

NOTICE: 0 rows copied from part_test_26

NOTICE: 0 rows copied from part_test_27

drop_partitions

-----------------

(1 row)

postgres=# select count(*) from part_test;

count

-------

9256

(1 row)

postgres=# \dt part_test_4

No matching relations found.

10. 绑定分区(已有的表加入分区表)

将已有的表，绑定到已有的某个分区主表。

已有的表与主表要保持一致的结构，包括dropped columns。 (查看pg_attribute的一致性)

如果设置了回调函数，会触发。

接口如下

attach_range_partition(relation REGCLASS, -- 主表OID

partition REGCLASS, -- 分区表OID

start_value ANYELEMENT, -- 起始值

end_value ANYELEMENT) -- 结束值

例子

postgres=# create table part_test_1 (like part_test including all);

CREATE TABLE

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

postgres=# \d+ part_test_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

postgres=# select attach_range_partition('part_test'::regclass, 'part_test_1'::regclass, '2019-01-01 00:00:00'::timestamp, '2019-02-01 00:00:00'::timestamp);

attach_range_partition

------------------------

part_test_1

(1 row)

绑定分区时，

自动创建继承关系，自动创建约束

postgres=# \d+ part_test_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_1_3_check" CHECK (crt_time >= '2019-01-01 00:00:00'::timestamp without time zone AND crt_time < '2019-02-01 00:00:00'::timestamp without time zone)

Inherits: part_test

11. 解绑分区(将分区变成普通表)

将分区从主表的继承关系中删除, 不删数据，删除继承关系，删除约束

接口如下

detach_range_partition(partition REGCLASS) -- 指定分区名，转换为普通表

例子

postgres=# select count(*) from part_test;

count

-------

9256

(1 row)

postgres=# select count(*) from part_test_2;

count

-------

733

(1 row)

postgres=# select detach_range_partition('part_test_2');

detach_range_partition

------------------------

part_test_2

(1 row)

postgres=# select count(*) from part_test_2;

count

-------

733

(1 row)

postgres=# select count(*) from part_test;

count

-------

8523

(1 row)

接口函数内容

postgres=# \sf detach_range_partition

CREATE OR REPLACE FUNCTION (partition regclass)

RETURNS text

LANGUAGE plpgsql

AS $function$

DECLARE

v_attname TEXT;

parent_relid REGCLASS;

BEGIN

parent_relid := (partition);

/* Acquire lock on parent */

PERFORM (parent_relid);

v_attname := attname

FROM

WHERE partrel = parent_relid;

IF v_attname IS NULL THEN

RAISE EXCEPTION 'table "%" is not partitioned', parent_relid::TEXT;

END IF;

/* Remove inheritance */

EXECUTE format('ALTER TABLE %s NO INHERIT %s',

partition::TEXT,

parent_relid::TEXT);

/* Remove check constraint */

EXECUTE format('ALTER TABLE %s DROP CONSTRAINT %s',

partition::TEXT,

(partition, v_attname));

/* Invalidate cache */

PERFORM (parent_relid);

RETURN partition;

END

$function$

12. 更新触发器

如果分区字段要被更新，需要创建更新触发器，否则不需要。

接口函数如下

create_hash_update_trigger(parent REGCLASS)

Creates the trigger on UPDATE for HASH partitions.

The UPDATE trigger isn't created by default because of the overhead.

It's useful in cases when the key attribute might change.

create_range_update_trigger(parent REGCLASS)

Same as above, but for a RANGE-partitioned table.

例子

创建更新触发器前，如果更新分区字段后的值跨分区了，会报约束错误。

postgres=# select * from part_test_3 limit 10;

id | info | crt_time

-----+----------------------------------+----------------------------

734 | 52288de52fccf3d47efe897e1320a0fd | 2016-11-25 00:11:34.113856

735 | 16f4fffda933356192af8d1991c673cf | 2016-11-25 01:11:34.113862

736 | 08ec10184500ef43a6efde38dc43df33 | 2016-11-25 02:11:34.113867

737 | e658c7fb7f44ae3145401bf348cfa9dd | 2016-11-25 03:11:34.113872

738 | 81ff4c5cb3404230341aa95c28f86931 | 2016-11-25 04:11:34.113877

739 | 931652d6ba49f8155b1486d30fd23bab | 2016-11-25 05:11:34.113883

740 | c616c01d98016ff0022aa5449d53ca8f | 2016-11-25 06:11:34.113888

741 | 358e44b68259587233a0f571e8a86a81 | 2016-11-25 07:11:34.113893

742 | 719bb75e67c23c1f76e4eb81cb22004e | 2016-11-25 08:11:34.113899

743 | 1fc90c401eec2927fe9bb726651e4936 | 2016-11-25 09:11:34.113904

(10 rows)

postgres=# update part_test set crt_time='2016-01-25 00:11:34.113856' where id=734;

ERROR: new row for relation "part_test_3" violates check constraint "pathman_part_test_3_3_check"

DETAIL: Failing row contains (734, 52288de52fccf3d47efe897e1320a0fd, 2016-01-25 00:11:34.113856).

创建更新触发器后，正常

postgres=# select create_range_update_trigger('part_test'::regclass);

create_range_update_trigger

--------------------------------

(1 row)

postgres=# update part_test set crt_time='2016-01-25 00:11:34.113856' where id=734;

UPDATE 0

postgres=# select * from part_test where id=734;

id | info | crt_time

-----+----------------------------------+----------------------------

734 | 52288de52fccf3d47efe897e1320a0fd | 2016-01-25 00:11:34.113856

(1 row)

通常业务设计时，不应该允许分区字段的变更。

13. 永久禁止分区表pg_pathman插件

你可以针对单个分区主表禁用pg_pathman。

接口函数如下

disable_pathman_for(relation TEXT)

Permanently disable pg_pathman partitioning mechanism for the specified parent table and remove the insert trigger if it exists.

All partitions and data remain unchanged.

postgres=# \sf disable_pathman_for

CREATE OR REPLACE FUNCTION (parent_relid regclass)

RETURNS void

LANGUAGE plpgsql

STRICT

AS $function$

BEGIN

PERFORM (parent_relid);

DELETE FROM WHERE partrel = parent_relid;

PERFORM (parent_relid);

/* Notify backend about changes */

PERFORM (parent_relid);

END

$function$

例子

postgres=# select disable_pathman_for('part_test');

NOTICE: drop cascades to 23 other objects

DETAIL: drop cascades to trigger part_test_upd_trig on table part_test_3

drop cascades to trigger part_test_upd_trig on table part_test_4

drop cascades to trigger part_test_upd_trig on table part_test_5

drop cascades to trigger part_test_upd_trig on table part_test_6

drop cascades to trigger part_test_upd_trig on table part_test_7

drop cascades to trigger part_test_upd_trig on table part_test_8

drop cascades to trigger part_test_upd_trig on table part_test_9

drop cascades to trigger part_test_upd_trig on table part_test_10

drop cascades to trigger part_test_upd_trig on table part_test_11

drop cascades to trigger part_test_upd_trig on table part_test_12

drop cascades to trigger part_test_upd_trig on table part_test_13

drop cascades to trigger part_test_upd_trig on table part_test_14

drop cascades to trigger part_test_upd_trig on table part_test_15

drop cascades to trigger part_test_upd_trig on table part_test_16

drop cascades to trigger part_test_upd_trig on table part_test_17

drop cascades to trigger part_test_upd_trig on table part_test_18

drop cascades to trigger part_test_upd_trig on table part_test_19

drop cascades to trigger part_test_upd_trig on table part_test_20

drop cascades to trigger part_test_upd_trig on table part_test_21

drop cascades to trigger part_test_upd_trig on table part_test_22

drop cascades to trigger part_test_upd_trig on table part_test_23

drop cascades to trigger part_test_upd_trig on table part_test_24

drop cascades to trigger part_test_upd_trig on table part_test_25

disable_pathman_for

---------------------

(1 row)

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_10,

part_test_11,

part_test_12,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_25,

part_test_26,

part_test_27,

part_test_28,

part_test_29,

part_test_3,

part_test_30,

part_test_31,

part_test_32,

part_test_33,

part_test_34,

part_test_35,

part_test_4,

part_test_5,

part_test_6,

part_test_7,

part_test_8,

part_test_9

postgres=# \d+ part_test_10

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_10_3_check" CHECK (crt_time >= '2017-06-25 00:00:00'::timestamp without time zone AND crt_time < '2017-07-25 00:00:00'::timestamp without time zone)

Inherits: part_test

禁用pg_pathman后，继承关系和约束不会变化，只是pg_pathman不介入custom scan 执行计划。

禁用pg_pathman后的执行计划

postgres=# explain select * from part_test where crt_time='2017-06-25 00:00:00'::timestamp;

QUERY PLAN

---------------------------------------------------------------------------------

Append (cost=0.00..16.00 rows=2 width=45)

-> Seq Scan on part_test (cost=0.00..0.00 rows=1 width=45)

Filter: (crt_time = '2017-06-25 00:00:00'::timestamp without time zone)

-> Seq Scan on part_test_10 (cost=0.00..16.00 rows=1 width=45)

Filter: (crt_time = '2017-06-25 00:00:00'::timestamp without time zone)

(5 rows)

disable_pathman_for没有可逆操作，请慎用。

14. 全局禁止pg_pathman

与禁用单个分区主表不同，全局禁止只需要调整参数即可，不需要修改pg_pathman的元数据，同时它是可逆操作。

例子

$ vi $PGDATA/

= off

$ pg_ctl reload

四.5 分区表高级管理

1. 禁用主表

当主表的数据全部迁移到分区后，可以禁用主表。

接口函数如下

set_enable_parent(relation REGCLASS, value BOOLEAN)

Include/exclude parent table into/from query plan.

In original PostgreSQL planner parent table is always included into query plan even if it's empty which can lead to additional overhead.

You can use disable_parent() if you are never going to use parent table as a storage.

Default value depends on the partition_data parameter that was specified during initial partitioning in create_range_partitions() or create_partitions_from_range() functions.

If the partition_data parameter was true then all data have already been migrated to partitions and parent table disabled.

Otherwise it is enabled.

例子

select set_enable_parent('part_test', false);

2. 自动扩展分区

范围分区表，允许自动扩展分区。

如果新插入的数据不在已有的分区范围内，会自动创建分区。

set_auto(relation REGCLASS, value BOOLEAN)

Enable/disable auto partition propagation (only for RANGE partitioning).

It is enabled by default.

例子

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_10,

part_test_11,

part_test_12,

part_test_13,

part_test_14,

part_test_15,

part_test_16,

part_test_17,

part_test_18,

part_test_19,

part_test_20,

part_test_21,

part_test_22,

part_test_23,

part_test_24,

part_test_25,

part_test_26,

part_test_3,

part_test_4,

part_test_5,

part_test_6,

part_test_7,

part_test_8,

part_test_9

postgres=# \d+ part_test_26

Table ""

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_26_3_check" CHECK (crt_time >= '2018-09-25 00:00:00'::timestamp without time zone AND crt_time < '2018-10-25 00:00:00'::timestamp without time zone)

Inherits: part_test

postgres=# \d+ part_test_25

Table ""

----------+-----------------------------+-----------+----------+--------------+-------------

Check constraints:

"pathman_part_test_25_3_check" CHECK (crt_time >= '2018-08-25 00:00:00'::timestamp without time zone AND crt_time < '2018-09-25 00:00:00'::timestamp without time zone)

Inherits: part_test

插入一个不在已有分区范围的值，会根据创建分区时的interval自动扩展若干个分区，这个操作可能很久很久。

postgres=# insert into part_test values (1,'test','2222-01-01'::timestamp);

等了很久

21298 digoal 20 0 93.1g 184m 127m R 98.7 0.1 0:33.34 postgres: bgworker: SpawnPartitionsWorker

插入结束后，扩展了好多好多分区，原因是插入的值跨度范围太大了。

postgres=# \d+ part_test

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Child tables: part_test_10,

part_test_100,

part_test_1000,

part_test_1001,

.....................................

很多

不建议开启自动扩展范围分区

3. 回调函数 (创建每个分区时都会触发)

回调函数是在每创建一个分区时会自动触发调用的函数。

例如可以用在ddl逻辑复制中，将DDL语句记录下来，存放到表中。

回调函数如下

set_init_callback(relation REGCLASS, callback REGPROC DEFAULT 0)

Set partition creation callback to be invoked for each attached or created partition (both HASH and RANGE).

The callback must have the following signature:

part_init_callback(args JSONB) RETURNS VOID.

Parameter arg consists of several fields whose presence depends on partitioning type:

/* RANGE-partitioned table abc (child abc_4) */

{

"parent": "abc",

"parttype": "2",

"partition": "abc_4",

"range_max": "401",

"range_min": "301"

}

/* HASH-partitioned table abc (child abc_0) */

{

"parent": "abc",

"parttype": "1",

"partition": "abc_0"

}

例子

回调函数

postgres=# create or replace function f_callback_test(jsonb) returns void as

declare

begin

create table if not exists rec_part_ddl(id serial primary key, parent name, parttype int, partition name, range_max text, range_min text);

if ($1->>'parttype')::int = 1 then

raise notice 'parent: %, parttype: %, partition: %', $1->>'parent', $1->>'parttype', $1->>'partition';

insert into rec_part_ddl(parent, parttype, partition) values (($1->>'parent')::name, ($1->>'parttype')::int, ($1->>'partition')::name);

elsif ($1->>'parttype')::int = 2 then

raise notice 'parent: %, parttype: %, partition: %, range_max: %, range_min: %', $1->>'parent', $1->>'parttype', $1->>'partition', $1->>'range_max', $1->>'range_min';

insert into rec_part_ddl(parent, parttype, partition, range_max, range_min) values (($1->>'parent')::name, ($1->>'parttype')::int, ($1->>'partition')::name, $1->>'range_max', $1->>'range_min');

end if;

end;

$$ language plpgsql strict;

测试表

postgres=# create table tt(id int, info text, crt_time timestamp not null);

CREATE TABLE

设置测试表的回调函数

select set_init_callback('tt'::regclass, 'f_callback_test'::regproc);

创建分区

postgres=# select

create_range_partitions('tt'::regclass, -- 主表OID

'crt_time', -- 分区列名

'2016-10-25 00:00:00'::timestamp, -- 开始值

interval '1 month', -- 间隔；interval 类型，用于时间分区表

24, -- 分多少个区

false) ;

create_range_partitions

-------------------------

(1 row)

检查回调函数是否已调用

postgres=# select * from rec_part_ddl;

----+--------+----------+-----------+---------------------+---------------------

1 | tt | 2 | tt_1 | 2016-11-25 00:00:00 | 2016-10-25 00:00:00

2 | tt | 2 | tt_2 | 2016-12-25 00:00:00 | 2016-11-25 00:00:00

3 | tt | 2 | tt_3 | 2017-01-25 00:00:00 | 2016-12-25 00:00:00

4 | tt | 2 | tt_4 | 2017-02-25 00:00:00 | 2017-01-25 00:00:00

5 | tt | 2 | tt_5 | 2017-03-25 00:00:00 | 2017-02-25 00:00:00

6 | tt | 2 | tt_6 | 2017-04-25 00:00:00 | 2017-03-25 00:00:00

7 | tt | 2 | tt_7 | 2017-05-25 00:00:00 | 2017-04-25 00:00:00

8 | tt | 2 | tt_8 | 2017-06-25 00:00:00 | 2017-05-25 00:00:00

9 | tt | 2 | tt_9 | 2017-07-25 00:00:00 | 2017-06-25 00:00:00

10 | tt | 2 | tt_10 | 2017-08-25 00:00:00 | 2017-07-25 00:00:00

11 | tt | 2 | tt_11 | 2017-09-25 00:00:00 | 2017-08-25 00:00:00

12 | tt | 2 | tt_12 | 2017-10-25 00:00:00 | 2017-09-25 00:00:00

13 | tt | 2 | tt_13 | 2017-11-25 00:00:00 | 2017-10-25 00:00:00

14 | tt | 2 | tt_14 | 2017-12-25 00:00:00 | 2017-11-25 00:00:00

15 | tt | 2 | tt_15 | 2018-01-25 00:00:00 | 2017-12-25 00:00:00

16 | tt | 2 | tt_16 | 2018-02-25 00:00:00 | 2018-01-25 00:00:00

17 | tt | 2 | tt_17 | 2018-03-25 00:00:00 | 2018-02-25 00:00:00

18 | tt | 2 | tt_18 | 2018-04-25 00:00:00 | 2018-03-25 00:00:00

19 | tt | 2 | tt_19 | 2018-05-25 00:00:00 | 2018-04-25 00:00:00

20 | tt | 2 | tt_20 | 2018-06-25 00:00:00 | 2018-05-25 00:00:00

21 | tt | 2 | tt_21 | 2018-07-25 00:00:00 | 2018-06-25 00:00:00

22 | tt | 2 | tt_22 | 2018-08-25 00:00:00 | 2018-07-25 00:00:00

23 | tt | 2 | tt_23 | 2018-09-25 00:00:00 | 2018-08-25 00:00:00

24 | tt | 2 | tt_24 | 2018-10-25 00:00:00 | 2018-09-25 00:00:00

(24 rows)

五、性能测试

前面介绍了pg_pathman的用法，以及它为什么高效。

接下来对比一下pg_pathman与传统分区表的效率。

1. pg_pathman vs 传统分区表

传统分区表使用触发器完成对写操作的分区选择。

传统分区表

create table test_pg_part_orig(id int primary key, info text, crt_time timestamp);

create table test_pg_part_orig_1(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_2(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_3(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_4(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_5(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_6(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_7(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_8(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_9(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_10(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_11(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_12(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_13(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_14(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_15(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_16(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_17(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_18(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_19(like test_pg_part_orig including all) inherits(test_pg_part_orig);

create table test_pg_part_orig_20(like test_pg_part_orig including all) inherits(test_pg_part_orig);

alter table test_pg_part_orig_1 add constraint ck_test_pg_part_orig_1 check(id >=1 and id<1000001);

alter table test_pg_part_orig_2 add constraint ck_test_pg_part_orig_2 check(id >=1000000 and id<2000001);

alter table test_pg_part_orig_3 add constraint ck_test_pg_part_orig_3 check(id >=2000000 and id<3000001);

alter table test_pg_part_orig_4 add constraint ck_test_pg_part_orig_4 check(id >=3000000 and id<4000001);

alter table test_pg_part_orig_5 add constraint ck_test_pg_part_orig_5 check(id >=4000000 and id<5000001);

alter table test_pg_part_orig_6 add constraint ck_test_pg_part_orig_6 check(id >=5000000 and id<6000001);

alter table test_pg_part_orig_7 add constraint ck_test_pg_part_orig_7 check(id >=6000000 and id<7000001);

alter table test_pg_part_orig_8 add constraint ck_test_pg_part_orig_8 check(id >=7000000 and id<8000001);

alter table test_pg_part_orig_9 add constraint ck_test_pg_part_orig_9 check(id >=8000000 and id<9000001);

alter table test_pg_part_orig_10 add constraint ck_test_pg_part_orig_10 check(id >=9000000 and id<10000001);

alter table test_pg_part_orig_11 add constraint ck_test_pg_part_orig_11 check(id >=10000000 and id<11000001);

alter table test_pg_part_orig_12 add constraint ck_test_pg_part_orig_12 check(id >=11000000 and id<12000001);

alter table test_pg_part_orig_13 add constraint ck_test_pg_part_orig_13 check(id >=12000000 and id<13000001);

alter table test_pg_part_orig_14 add constraint ck_test_pg_part_orig_14 check(id >=13000000 and id<14000001);

alter table test_pg_part_orig_15 add constraint ck_test_pg_part_orig_15 check(id >=14000000 and id<15000001);

alter table test_pg_part_orig_16 add constraint ck_test_pg_part_orig_16 check(id >=15000000 and id<16000001);

alter table test_pg_part_orig_17 add constraint ck_test_pg_part_orig_17 check(id >=16000000 and id<17000001);

alter table test_pg_part_orig_18 add constraint ck_test_pg_part_orig_18 check(id >=17000000 and id<18000001);

alter table test_pg_part_orig_19 add constraint ck_test_pg_part_orig_19 check(id >=18000000 and id<19000001);

alter table test_pg_part_orig_20 add constraint ck_test_pg_part_orig_20 check(id >=19000000 and id<20000001);

create or replace function tg_ins() returns trigger as $$

declare

id int := NEW.id;

begin

if NEW.id >=1 and NEW.id<1000001 then

insert into test_pg_part_orig_1 values (NEW.*);

elsif NEW.id >=1000000 and NEW.id<2000001 then

insert into test_pg_part_orig_2 values (NEW.*);

elsif NEW.id >=2000000 and NEW.id<3000001 then

insert into test_pg_part_orig_3 values (NEW.*);

elsif NEW.id >=3000000 and NEW.id<4000001 then

insert into test_pg_part_orig_4 values (NEW.*);

elsif NEW.id >=4000000 and NEW.id<5000001 then

insert into test_pg_part_orig_5 values (NEW.*);

elsif NEW.id >=5000000 and NEW.id<6000001 then

insert into test_pg_part_orig_6 values (NEW.*);

elsif NEW.id >=6000000 and NEW.id<7000001 then

insert into test_pg_part_orig_7 values (NEW.*);

elsif NEW.id >=7000000 and NEW.id<8000001 then

insert into test_pg_part_orig_8 values (NEW.*);

elsif NEW.id >=8000000 and NEW.id<9000001 then

insert into test_pg_part_orig_9 values (NEW.*);

elsif NEW.id >=9000000 and NEW.id<10000001 then

insert into test_pg_part_orig_10 values (NEW.*);

elsif NEW.id >=10000000 and NEW.id<11000001 then

insert into test_pg_part_orig_11 values (NEW.*);

elsif NEW.id >=11000000 and NEW.id<12000001 then

insert into test_pg_part_orig_12 values (NEW.*);

elsif NEW.id >=12000000 and NEW.id<13000001 then

insert into test_pg_part_orig_13 values (NEW.*);

elsif NEW.id >=13000000 and NEW.id<14000001 then

insert into test_pg_part_orig_14 values (NEW.*);

elsif NEW.id >=14000000 and NEW.id<15000001 then

insert into test_pg_part_orig_15 values (NEW.*);

elsif NEW.id >=15000000 and NEW.id<16000001 then

insert into test_pg_part_orig_16 values (NEW.*);

elsif NEW.id >=16000000 and NEW.id<17000001 then

insert into test_pg_part_orig_17 values (NEW.*);

elsif NEW.id >=17000000 and NEW.id<18000001 then

insert into test_pg_part_orig_18 values (NEW.*);

elsif NEW.id >=18000000 and NEW.id<19000001 then

insert into test_pg_part_orig_19 values (NEW.*);

elsif NEW.id >=19000000 and NEW.id<20000001 then

insert into test_pg_part_orig_20 values (NEW.*);

else

-- 超出范围抛出异常

raise exception 'id: % out of range', NEW.id;

end if;

return null;

end;

$$ language plpgsql;

postgres=# create trigger tg_ins before insert on test_pg_part_orig for each row execute procedure tg_ins();

CREATE TRIGGER

postgres=# insert into test_pg_part_orig values (0);

ERROR: id: 0 out of range

CONTEXT: PL/pgSQL function tg_ins() line 27 at RAISE

postgres=# insert into test_pg_part_orig values (1);

INSERT 0 0

postgres=# select * from test_pg_part_orig;

id | info | crt_time

----+------+----------

1 | |

(1 row)

postgres=# select * from test_pg_part_orig where id=1;

id | info | crt_time

----+------+----------

1 | |

(1 row)

postgres=# explain select * from test_pg_part_orig where id=1;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------

Append (cost=0.00..2.17 rows=2 width=44)

-> Seq Scan on test_pg_part_orig (cost=0.00..0.00 rows=1 width=44)

Filter: (id = 1)

-> Index Scan using test_pg_part_orig_1_pkey on test_pg_part_orig_1 (cost=0.15..2.17 rows=1 width=44)

Index Cond: (id = 1)

(5 rows)

pg_pathman分区表

create table test_pg_part_pathman(id int primary key, info text, crt_time timestamp);

select

create_range_partitions('test_pg_part_pathman'::regclass, -- 主表OID

'id', -- 分区列名

1, -- 开始值

1000000, -- 间隔

20, -- 分多少个区

true) ; -- 迁移数据

postgres=# select set_enable_parent('test_pg_part_pathman'::regclass, false);

postgres=# \d+ test_pg_part_pathman

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Indexes:

"test_pg_part_pathman_pkey" PRIMARY KEY, btree (id)

Child tables: test_pg_part_pathman_1,

test_pg_part_pathman_10,

test_pg_part_pathman_11,

test_pg_part_pathman_12,

test_pg_part_pathman_13,

test_pg_part_pathman_14,

test_pg_part_pathman_15,

test_pg_part_pathman_16,

test_pg_part_pathman_17,

test_pg_part_pathman_18,

test_pg_part_pathman_19,

test_pg_part_pathman_2,

test_pg_part_pathman_20,

test_pg_part_pathman_3,

test_pg_part_pathman_4,

test_pg_part_pathman_5,

test_pg_part_pathman_6,

test_pg_part_pathman_7,

test_pg_part_pathman_8,

test_pg_part_pathman_9

postgres=# \d+ test_pg_part_pathman_1

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Indexes:

"test_pg_part_pathman_1_pkey" PRIMARY KEY, btree (id)

Check constraints:

"pathman_test_pg_part_pathman_1_1_check" CHECK (id >= 1 AND id < 1000001)

Inherits: test_pg_part_pathman

postgres=# \d+ test_pg_part_pathman_10

Table ";

----------+-----------------------------+-----------+----------+--------------+-------------

Indexes:

"test_pg_part_pathman_10_pkey" PRIMARY KEY, btree (id)

Check constraints:

"pathman_test_pg_part_pathman_10_1_check" CHECK (id >= 9000001 AND id < 10000001)

Inherits: test_pg_part_pathman

性能对比

1. 插入

传统分区表

postgres=# \timing

Timing is on.

postgres=# truncate test_pg_part_orig;

postgres=# insert into test_pg_part_orig select generate_series(1,20000000);

INSERT 0 0

Time: 647028.838 ms

postgres=# select count(*) from test_pg_part_orig;

count

----------

20000000

(1 row)

Time: 1879.631 ms

pg_pathman分区表

postgres=# insert into test_pg_part_pathman select generate_series(1,20000000);

INSERT 0 20000000

Time: 61634.401 ms

postgres=# select count(*) from test_pg_part_pathman;

count

----------

20000000

(1 row)

Time: 1879.867 ms

2. 查询

传统分区表

$ vi

\set id random(1,20000000)

select * from test_pg_part_orig where id=:id;

$ pgbench -M prepared -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 75102.033587 (including connections establishing)

tps = 75104.843437 (excluding connections establishing)

pg_pathman分区表

$ vi

\set id random(1,20000000)

select * from test_pg_part_pathman where id=:id;

$ pgbench -M simple -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 420535.120243 (including connections establishing)

tps = 420549.323880 (excluding connections establishing)

目前pg_pathman使用prepared statement会导致大量的LWLOCK，需要优化，所以这里先使用了simple query

已提issue

3. 更新

传统分区表

$ vi

\set id random(1,20000000)

update test_pg_part_orig set info='test' where id=:id;

$ pgbench -M prepared -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 56540.152159 (including connections establishing)

tps = 56542.070077 (excluding connections establishing)

pg_pathman分区表

$ vi

\set id random(1,20000000)

update test_pg_part_pathman set info='test' where id=:id;

$ pgbench -M simple -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 224089.981643 (including connections establishing)

tps = 224098.969105 (excluding connections establishing)

2. pg_pathman vs 单表

postgres=# create table test_pg_part_single(id int primary key, info text, crt_time timestamp);

postgres=# insert into test_pg_part_single select generate_series(1,20000000);

INSERT 0 20000000

Time: 46749.048 ms

$ vi

\set id random(1,20000000)

select * from test_pg_part_single where id=:id;

$ pgbench -M prepared -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 1071941.086992 (including connections establishing)

tps = 1071986.078786 (excluding connections establishing)

$ vi

\set id random(1,20000000)

update test_pg_part_single set info='test' where id=:id;

$ pgbench -M prepared -n -r -P 1 -f ./ -c 64 -j 64 -T 120

tps = 262356.355517 (including connections establishing)

tps = 262365.373182 (excluding connections establishing)

性能测试数据对比如图

3. pg_pathman hash分区表性能对比传统分区表