• CockroachDB-哈希分片索引


    本文知识点来源于官网地址https://www.cockroachlabs.com/docs/stable/hash-sharded-indexes.html

    如果您正在处理一个必须按顺序键建立索引的表,则应该使用哈希分片索引。哈希分片索引将顺序流量均匀分布在不同的范围内,消除了单一范围的热点,并以较小的读性能代价提高了顺序键索引上的写性能。

    概览

    CRDB根据范围的大小和到范围的负载流自动分割键值存储中的数据范围。为了根据负载划分范围,系统在范围中寻找一个点来平均划分传入的流量。如果范围在本质上是顺序的数据列(例如,有序序列或一系列递增的、非重复的timestamp)上建立索引,那么对该范围的所有写入都将是索引中的最后(或第一个)项,并被追加到范围的末尾。因此,系统无法在范围内找到一个点来平均分配流量,并且该范围无法从基于负载的分割中受益,从而在单个范围上产生热点。
    哈希分片索引通过将顺序数据分布到集群中的多个节点上,消除了热点,从而解决了这个问题。但是,这样做的代价是在读取顺序数据或数据范围时对性能有很小的影响,因为不能保证顺序接近的值将位于同一个节点上。
    哈希分片索引包含一个虚拟计算列,称为分片列。CRDB使用这个分片列(而不是索引中的顺序列)来控制值在整个索引中的分布。分片列在默认情况下是隐藏的,但可以通过SHOW COLUMNS看到。

    分片数

    在创建哈希分片索引时,CRDB根据sql.defaults.default_hash_sharded_index_bucket_count集群设置的值在集群中创建指定数量的分片(桶)。您还可以通过传入一个可选的存储参数来指定一个不同的bucket_count
    对于大多数用例,不需要更改集群设置,并且可以使用USING HASH而不是USING HASH with (bucket_count = n)创建哈希分片索引。将集群设置或存储参数更改为大于集群内节点数量的数量将产生递减的收益,不建议这样做。
    更多的桶数量允许更好的负载平衡,从而提高写吞吐量。更多的桶不利于需要扫描数据来完成查询的操作;这样的查询现在需要扫描每个桶并组合结果。
    如果默认的bucket_count不能满足您的用例,我们建议使用不同的bucket_count对您的工作负载进行彻底的性能测试。

    分区表上的哈希分片索引

    可以在以下场景下使用隐式分区创建哈希分片索引:

    • 表是用REGIONAL BY ROW隐式分区的,并且crdb_region列不是哈希分片索引中的列的一部分。
    • 表使用PARTITION ALL BY进行隐式分区,分区列不是哈希分片索引中的列的一部分。注意,PARTITION ALL BY在预览中。

    但是,如果表的索引(无论是主键还是次索引)是用PARTITION BY显式分区的,则不能对该索引进行哈希分片。分区列也不能显式地作为哈希分片索引的键列放置,包括region BY ROW表的crdb_region列。

    创建哈希分片索引

    创建哈希分片索引的一般过程是将USING HASH子句添加到以下语句之一:

    • CREATE INDEX
    • CREATE TABLE
    • ALTER PRIMARY KEY

    当使用此子句时,CRDB创建一个分片列,然后将每个索引分片存储在基础键值存储中,并使用计算列的一个散列作为前缀。

    示例1-创建一个基于主键做hash分片的表

    root@xx.xx.xx.xx:20158/defaultdb> CREATE TABLE products (
                                   ->     ts DECIMAL PRIMARY KEY USING HASH WITH BUCKET_COUNT=3,
                                   ->     product_id INT8
                                   ->     );
    CREATE TABLE
    
    
    Time: 70ms total (execution 69ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW INDEX FROM products;
      table_name | index_name | non_unique | seq_in_index |       column_name        | direction | storing | implicit
    -------------+------------+------------+--------------+--------------------------+-----------+---------+-----------
      products   | primary    |   false    |            1 | dbms_internal_ts_shard_3 | ASC       |  false  |   true
      products   | primary    |   false    |            2 | ts                       | ASC       |  false  |  false
      products   | primary    |   false    |            3 | product_id               | N/A       |  true   |  false
    (3 rows)
    
    
    Time: 17ms total (execution 16ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW COLUMNS FROM products;
            column_name        | data_type | is_nullable | column_default |              generation_expression               |  indices  | is_hidden
    ---------------------------+-----------+-------------+----------------+--------------------------------------------------+-----------+------------
      dbms_internal_ts_shard_3 | INT4      |    false    | NULL           | mod(fnv32(dbms_internal.datums_to_bytes(ts)), 3) | {primary} |   true
      ts                       | DECIMAL   |    false    | NULL           |                                                  | {primary} |   false
      product_id               | INT8      |    true     | NULL           |                                                  | {primary} |   false
    (3 rows)
    
    
    Time: 167ms total (execution 166ms / network 1ms)
    root@xx.xx.xx.xx:20158/defaultdb> show ranges from table products;
      start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
    ------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
      NULL      | NULL    |       72 |             0 |            1 |                       | {1}      | {}
    (1 row)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35

    示例2-创建一个带hash分片二级索引的表

    root@xx.xx.xx.xx:20158/defaultdb> CREATE TABLE events (
                                   ->     product_id INT8,
                                   ->     owner UUID,
                                   ->     serial_number VARCHAR,
                                   ->     event_id UUID,
                                   ->     ts TIMESTAMP,
                                   ->     data JSONB,
                                   ->     PRIMARY KEY (product_id, owner, serial_number, ts, event_id),
                                   ->     INDEX (ts) USING HASH WITH BUCKET_COUNT=3
                                   -> );
    CREATE TABLE
    
    
    Time: 74ms total (execution 74ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW INDEX FROM events;
      table_name |  index_name   | non_unique | seq_in_index |       column_name        | direction | storing | implicit
    -------------+---------------+------------+--------------+--------------------------+-----------+---------+-----------
      events     | events_ts_idx |    true    |            1 | dbms_internal_ts_shard_3 | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            2 | ts                       | ASC       |  false  |  false
      events     | events_ts_idx |    true    |            3 | product_id               | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            4 | owner                    | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            5 | serial_number            | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            6 | event_id                 | ASC       |  false  |   true
      events     | primary       |   false    |            1 | product_id               | ASC       |  false  |  false
      events     | primary       |   false    |            2 | owner                    | ASC       |  false  |  false
      events     | primary       |   false    |            3 | serial_number            | ASC       |  false  |  false
      events     | primary       |   false    |            4 | ts                       | ASC       |  false  |  false
      events     | primary       |   false    |            5 | event_id                 | ASC       |  false  |  false
      events     | primary       |   false    |            6 | data                     | N/A       |  true   |  false
      events     | primary       |   false    |            7 | dbms_internal_ts_shard_3 | N/A       |  true   |  false
    (13 rows)
    
    
    Time: 28ms total (execution 27ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW COLUMNS FROM events;
            column_name        | data_type | is_nullable | column_default |              generation_expression               |         indices         | is_hidden
    ---------------------------+-----------+-------------+----------------+--------------------------------------------------+-------------------------+------------
      product_id               | INT8      |    false    | NULL           |                                                  | {events_ts_idx,primary} |   false
      owner                    | UUID      |    false    | NULL           |                                                  | {events_ts_idx,primary} |   false
      serial_number            | VARCHAR   |    false    | NULL           |                                                  | {events_ts_idx,primary} |   false
      event_id                 | UUID      |    false    | NULL           |                                                  | {events_ts_idx,primary} |   false
      ts                       | TIMESTAMP |    false    | NULL           |                                                  | {events_ts_idx,primary} |   false
      data                     | JSONB     |    true     | NULL           |                                                  | {primary}               |   false
      dbms_internal_ts_shard_3 | INT4      |    false    | NULL           | mod(fnv32(dbms_internal.datums_to_bytes(ts)), 3) | {events_ts_idx,primary} |   true
    (7 rows)
    
    
    Time: 56ms total (execution 55ms / network 0ms)
    root@xx.xx.xx.xx:20158/defaultdb> show ranges from table events;
      start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
    ------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
      NULL      | NULL    |       73 |             0 |            1 |                       | {1}      | {}
    (1 row)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55

    示例3-将主键修改为hash分片

    root@xx.xx.xx.xx:20158/defaultdb> ALTER TABLE events ALTER PRIMARY KEY USING COLUMNS (product_id, owner, serial_number, ts, event_id) USING HASH WITH BUCKET_COUNT=3;
    NOTICE: primary key changes are finalized asynchronously; further schema changes on this table may be restricted until the job completes
    ALTER TABLE
    
    
    Time: 1.334s total (execution 1.333s / network 0.001s)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW COLUMNS FROM events;
                                column_name                            | data_type | is_nullable | column_default |                                    generation_expression                                     |         indices         | is_hidden
    -------------------------------------------------------------------+-----------+-------------+----------------+----------------------------------------------------------------------------------------------+-------------------------+------------
      product_id                                                       | INT8      |    false    | NULL           |                                                                                              | {events_ts_idx,primary} |   false
      owner                                                            | UUID      |    false    | NULL           |                                                                                              | {events_ts_idx,primary} |   false
      serial_number                                                    | VARCHAR   |    false    | NULL           |                                                                                              | {events_ts_idx,primary} |   false
      event_id                                                         | UUID      |    false    | NULL           |                                                                                              | {events_ts_idx,primary} |   false
      ts                                                               | TIMESTAMP |    false    | NULL           |                                                                                              | {events_ts_idx,primary} |   false
      data                                                             | JSONB     |    true     | NULL           |                                                                                              | {primary}               |   false
      dbms_internal_ts_shard_3                                         | INT4      |    false    | NULL           | mod(fnv32(dbms_internal.datums_to_bytes(ts)), 3)                                             | {events_ts_idx,primary} |   true
      dbms_internal_event_id_owner_product_id_serial_number_ts_shard_3 | INT4      |    false    | NULL           | mod(fnv32(dbms_internal.datums_to_bytes(event_id, owner, product_id, serial_number, ts)), 3) | {events_ts_idx,primary} |   true
    (8 rows)
    
    
    Time: 63ms total (execution 62ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> SHOW INDEX FROM events;
      table_name |  index_name   | non_unique | seq_in_index |                           column_name                            | direction | storing | implicit
    -------------+---------------+------------+--------------+------------------------------------------------------------------+-----------+---------+-----------
      events     | events_ts_idx |    true    |            1 | dbms_internal_ts_shard_3                                         | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            2 | ts                                                               | ASC       |  false  |  false
      events     | events_ts_idx |    true    |            3 | dbms_internal_event_id_owner_product_id_serial_number_ts_shard_3 | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            4 | product_id                                                       | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            5 | owner                                                            | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            6 | serial_number                                                    | ASC       |  false  |   true
      events     | events_ts_idx |    true    |            7 | event_id                                                         | ASC       |  false  |   true
      events     | primary       |   false    |            1 | dbms_internal_event_id_owner_product_id_serial_number_ts_shard_3 | ASC       |  false  |   true
      events     | primary       |   false    |            2 | product_id                                                       | ASC       |  false  |  false
      events     | primary       |   false    |            3 | owner                                                            | ASC       |  false  |  false
      events     | primary       |   false    |            4 | serial_number                                                    | ASC       |  false  |  false
      events     | primary       |   false    |            5 | ts                                                               | ASC       |  false  |  false
      events     | primary       |   false    |            6 | event_id                                                         | ASC       |  false  |  false
      events     | primary       |   false    |            7 | data                                                             | N/A       |  true   |  false
      events     | primary       |   false    |            8 | dbms_internal_ts_shard_3                                         | N/A       |  true   |  false
    (15 rows)
    
    
    Time: 14ms total (execution 13ms / network 1ms)
    
    root@xx.xx.xx.xx:20158/defaultdb> show ranges from index events_ts_idx;
      start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
    ------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
      NULL      | NULL    |       73 |             0 |            1 |                       | {1}      | {}
    (1 row)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
  • 相关阅读:
    springboot个人博客系统
    Git面试题整理(对比)
    Goldilocks域
    微信小程序-婚礼邀请函页面
    【Python百日进阶-数据分析】Day123 - Plotly Figure参数:饼图(一)
    java的基本数据类型
    单位固定资产应该怎么管理
    【MATLAB源码-第49期】基于蚁群算法(ACO)算法的栅格路径规划,输出最佳路径图和算法收敛曲线图。
    基于Qt Creator开发的坦克大战小游戏
    LLDB 三种输出方式 对比及原理探索
  • 原文地址:https://blog.csdn.net/Post_Yuan/article/details/127789812