• Redis数据结构之字典


    字典经常作为一种数据结构内置在很多高级编程语言中,但是Redis使用C语言实现,没有内置这种数据结构,因此Redis自己构建了字典的实现。
    Redis数据库就是使用字典的数据结构来作为底层实现。另外Redis的哈希键对象也是使用了字典的数据结构。

    字典的实现

    Redis的字典使用了哈希表作为底层实现,其中一个哈希表包含了多个哈希节点,而每个哈希节点又包含多个键值对
    字典结构

    字典

    typedef struct dict {
        dictType *type;
        void *privdata;
        dictht ht[2];
        long rehashidx; /* rehashing not in progress if rehashidx == -1 */
        unsigned long iterators; /* number of iterators currently running */
    } dict;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    哈希表

    /* This is our hash table structure. Every dictionary has two of this as we
     * implement incremental rehashing, for the old to the new table. */
    typedef struct dictht { // 哈希表
        dictEntry **table; // 哈希表的数组,数组中每个元素都是指针,指向 dictEntry 结构
        unsigned long size; // 哈希表的大小,table 数组的大小
        unsigned long sizemask; // 哈希表掩码,用于计算索引值,等于 size-1
        unsigned long used; // 哈希表已有的节点(键值对)数量
    } dictht;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    哈希节点

    typedef struct dictEntry {
        void *key;  // 键 key
        union { // 值 val
            void *val;
            uint64_t u64;
            int64_t s64;
            double d;
        } v;
        struct dictEntry *next; // 指向下一个哈希表节点,链表法解决hash冲突
    } dictEntry;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    哈希节点维护一个链表用于解决哈希冲突。

    哈希冲突

    什么是哈希冲突

    哈希的过程就是给定一个输入,然后经过指定的哈希函数,计算出一个输出。输入的值范围大于输出的值范围,所以一定存在某多个输入,经过哈希函数计算的输出是相等的。那么这些不同的输入就是发生了哈希碰撞。
    哈希冲突

    如何解决哈希冲突

    常见的哈希冲突解决办法有链地址法、开放寻址法、再哈希法等。redis哈希表使用链地址法解决冲突的问题。
    Redis哈希表哈希节点维护一个next指针,将冲突的哈希节点通过单链表的方式串起来。

      if (d->type->no_value) {
      		/* Allocate an entry without value. */
            entry = createEntryNoValue(key, *bucket);
        } else { // 如果哈希节点已经存在元素,则将新的节点添加到哈希节点链表的头部,已冲突的节点链接到新节点的next处
            entry->key = key;
            entry->next = *bucket;
        }
        *bucket = entry;
        d->ht_used[htidx]++
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    字典扩容

    当哈希表存储节点达到一定数量后,会对字典进行扩容,以提升字典效率。

    扩容时机

    字典的扩容时机主要关注两个指标。

    • 当前can_resize标识
    • 负载因子
      首先查看判定是否需要扩容的源码段。
        if ((dict_can_resize == DICT_RESIZE_ENABLE &&
             d->ht_used[0] >= DICTHT_SIZE(d->ht_size_exp[0])) ||
            (dict_can_resize != DICT_RESIZE_FORBID &&
             d->ht_used[0] / DICTHT_SIZE(d->ht_size_exp[0]) > dict_force_resize_ratio))
        {
            return dictExpand(d, d->ht_used[0] + 1);
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    当在两种场景下,会对字典进行翻倍的扩容。

    • 当前resize为DICT_RESIZE_ENABLE,且负载因子大于1
    • 当前resize为非DICT_RESIZE_FORBID(resize为DICT_RESIZE_ENABLE),且负载因子大于5

    can_resize标识

    can_resize有三个枚举

    typedef enum {
        DICT_RESIZE_ENABLE,
        DICT_RESIZE_AVOID,
        DICT_RESIZE_FORBID,
    } dictResizeEnable;
    
    • 1
    • 2
    • 3
    • 4
    • 5

    redis会根据当前进程情况对resize进行更新

    /* This function is called once a background process of some kind terminates,
     - as we want to avoid resizing the hash tables when there is a child in order
     - to play well with copy-on-write (otherwise when a resize happens lots of
     - memory pages are copied). The goal of this function is to update the ability
     - for dict.c to resize or rehash the tables accordingly to the fact we have an
     - active fork child running. */
    void updateDictResizePolicy(void) {
        if (server.in_fork_child != CHILD_TYPE_NONE)
            dictSetResizeEnabled(DICT_RESIZE_FORBID);
        else if (hasActiveChildProcess())
            dictSetResizeEnabled(DICT_RESIZE_AVOID);
        else
            dictSetResizeEnabled(DICT_RESIZE_ENABLE);
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 默认resize=DICT_RESIZE_ENABLE
    • 当前处于子进程中resize=DICT_RESIZE_FORBID
    • 存在子进程时,resize=DICT_RESIZE_AVOID

    负载因子

    负载因子 = d->ht_used[0] / DICTHT_SIZE(d->ht_size_exp[0])

    • 当负载因子大于等于 1 ,且 Redis 没有在执行 bgsave 命令、 bgrewiteaof 命令, RDB 快照 或 AOF 重写 时,进行 rehash 操作
    • 当负载因子大于等于 5 时,不管有没有有在执行 RDB 快照 或AOF 重写,都会强制进行 rehash 操作

    渐进式rehash

    redis在进行rehash时,会将ht[0]中的全部键rehash到ht[1]中。但是redis在rehash动作并不是一次性的完成的。而是通过分治的思想,分多次、渐进式的完成的。
    因为redis是单线程模型,如果一个大key在进行一次性rehash时,会有较大的计算量可能导致服务器在一段时间内停止服务,从而引起客户端长时间阻塞不可用。

    /* Performs N steps of incremental rehashing. Returns 1 if there are still
     * keys to move from the old to the new hash table, otherwise 0 is returned.
     *  * Note that a rehashing step consists in moving a bucket (that may have more
     * than one key as we use chaining) from the old to the new hash table, however
     * since part of the hash table may be composed of empty spaces, it is not
     * guaranteed that this function will rehash even a single bucket, since it
     * will visit at max N*10 empty buckets in total, otherwise the amount of
     * work it does would be unbound and the function may block for a long time. */
    int dictRehash(dict *d, int n) {
        int empty_visits = n*10; /* Max number of empty buckets to visit. */
        if (!dictIsRehashing(d)) return 0;
    
        while(n-- && d->ht[0].used != 0) {
            dictEntry *de, *nextde;
    
            /* Note that rehashidx can't overflow as we are sure there are more
             * elements because ht[0].used != 0 */
            assert(d->ht[0].size > (unsigned long)d->rehashidx);
            while(d->ht[0].table[d->rehashidx] == NULL) {
                d->rehashidx++;
                if (--empty_visits == 0) return 1;
            }
            de = d->ht[0].table[d->rehashidx];
            /* Move all the keys in this bucket from the old to the new hash HT */
            while(de) {
                uint64_t h;
    
                nextde = de->next;
                /* Get the index in the new hash table */
                h = dictHashKey(d, de->key) & d->ht[1].sizemask;
                de->next = d->ht[1].table[h];
                d->ht[1].table[h] = de;
                d->ht[0].used--;
                d->ht[1].used++;
                de = nextde;
            }
            d->ht[0].table[d->rehashidx] = NULL;
            d->rehashidx++;
        }
    
        /* Check if we already rehashed the whole table... */
        if (d->ht[0].used == 0) {
            zfree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 0;
        }
    
        /* More to rehash... */
        return 1;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 存在rehash的过程中,不允许再次发起rehash
    • 每次最大遍历10个桶,防止遍历过多空桶,导致阻塞线程
    • 每次将键值从ht[0]迁移到ht[1]后,d->ht[0].used-- d->ht[1].used++ d->rehashidx++;
    • rehash完毕后,rehashidx置为-1

    哈希键操作

    新增

    dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
    {
        long index;
        dictEntry *entry;
        dictht *ht;
    
        if (dictIsRehashing(d)) _dictRehashStep(d);
    
        /* Get the index of the new element, or -1 if
         * the element already exists. */
        if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
            return NULL;
    
        /* Allocate the memory and store the new entry.
         * Insert the element in top, with the assumption that in a database
         * system it is more likely that recently added entries are accessed
         * more frequently. */
        ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
        entry = zmalloc(sizeof(*entry));
        entry->next = ht->table[index];
        ht->table[index] = entry;
        ht->used++;
    
        /* Set the hash entry fields. */
        dictSetKey(d, entry, key);
        return entry;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 如果在rehash进行中,首先执行一步rehash
    • 如果key已经存在,不执行set操作
    • 新增时只写入ht[1],保证ht[0]的元素只会越来越少,不会新增。

    更新

    /* Add or Overwrite:
     * Add an element, discarding the old value if the key already exists.
     * Return 1 if the key was added from scratch, 0 if there was already an
     * element with such key and dictReplace() just performed a value update
     * operation. */
    int dictReplace(dict *d, void *key, void *val)
    {
        dictEntry *entry, *existing, auxentry;
    
        /* Try to add the element. If the key
         * does not exists dictAdd will succeed. */
        entry = dictAddRaw(d,key,&existing);
        if (entry) {
            dictSetVal(d, entry, val);
            return 1;
        }
    
        /* Set the new value and free the old one. Note that it is important
         * to do that in this order, as the value may just be exactly the same
         * as the previous one. In this context, think to reference counting,
         * you want to increment (set), and then decrement (free), and not the
         * reverse. */
        auxentry = *existing;
        dictSetVal(d, existing, val);
        dictFreeVal(d, &auxentry);
        return 0;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    读取

    dictEntry *dictFind(dict *d, const void *key)
    {
        dictEntry *he;
        uint64_t h, idx, table;
    
        if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* dict is empty */
        if (dictIsRehashing(d)) _dictRehashStep(d);
        h = dictHashKey(d, key);
        for (table = 0; table <= 1; table++) {
            idx = h & d->ht[table].sizemask;
            he = d->ht[table].table[idx];
            while(he) {
                if (key==he->key || dictCompareKeys(d, key, he->key))
                    return he;
                he = he->next;
            }
            if (!dictIsRehashing(d)) return NULL;
        }
        return NULL;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 如果在rehash进行中,首先执行一步rehash
    • 如果当前在reahsh,遍历ht[0]和ht[1],否则只遍历ht[0]
    • 通过哈希函数找到对应的哈希桶,对桶内的元素进行键值对比,如果键值不相等,会进行对next后续键再次对比

    删除

    /* Search and remove an element. This is an helper function for
     * dictDelete() and dictUnlink(), please check the top comment
     * of those functions. */
    static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
        uint64_t h, idx;
        dictEntry *he, *prevHe;
        int table;
    
        if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;
    
        if (dictIsRehashing(d)) _dictRehashStep(d);
        h = dictHashKey(d, key);
    
        for (table = 0; table <= 1; table++) {
            idx = h & d->ht[table].sizemask;
            he = d->ht[table].table[idx];
            prevHe = NULL;
            while(he) {
                if (key==he->key || dictCompareKeys(d, key, he->key)) {
                    /* Unlink the element from the list */
                    if (prevHe)
                        prevHe->next = he->next;
                    else
                        d->ht[table].table[idx] = he->next;
                    if (!nofree) {
                        dictFreeKey(d, he);
                        dictFreeVal(d, he);
                        zfree(he);
                    }
                    d->ht[table].used--;
                    return he;
                }
                prevHe = he;
                he = he->next;
            }
            if (!dictIsRehashing(d)) break;
        }
        return NULL; /* not found */
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 如果在rehash进行中,首先执行一步rehash
    • 如果当前在reahsh,遍历ht[0]和ht[1],否则只遍历ht[0]
    • 通过判断key值,如果是需要删除的key,则会调整哈希节点的next指针,达到删除效果
    • 更新哈希表元素占用指标used参数
  • 相关阅读:
    Spark - 第3章 Spark工作集介绍
    六、react组件通信-父子组件通信-子父组件通信-跨级组件的传参方式-context方式的传参
    如何通过python爬股票接口获取证券交易日?
    EF Core 如何应对高并发
    工业CT检测技术及工业CT基本组成
    广度优先搜索算法
    10.19作业
    R数据分析:净重新分类(NRI)和综合判别改善(IDI)指数的理解
    技巧 | 如何解决 OBS 系统声音无法捕获问题 | Mac
    从零手搓一个【消息队列】创建核心类, 数据库设计与实现
  • 原文地址:https://blog.csdn.net/kgcourage/article/details/134325010