【JAVA基础】JDK8 HashMap 源码深度解析

文章目录

数据结构

JDK7：数组 + 链表
JDK8：数组 + 链表 + 红黑树
本文中的源码都是 JDK8 的。

源码解析

重要的成员变量

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

    // 默认初始容量 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

    // 最大容量
    static final int MAXIMUM_CAPACITY = 1 << 30;

    // 默认负载因子 0.75
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    // 当链表中的元素个数大于等于 8，并且数组的长度大于等于 64 时将链表转为红黑树
    static final int TREEIFY_THRESHOLD = 8;

    // 当链表中的元素个数大于等于 8，并且数组的长度大于等于 64 时将链表转为红黑树
    static final int MIN_TREEIFY_CAPACITY = 64;

    // 当红黑树的长度小于 6 时转为链表
    static final int UNTREEIFY_THRESHOLD = 6;

    // 第一次使用时，才进行初始化操作
    transient Node<K,V>[] table;
  
    // 阈（yu）值，由负载因子和容量决定：CAPACITY * LOAD_FACTOR，默认为 16 * 0.75 = 12
    // 当哈希桶数组内的节点数大于该值时，则扩容
    int threshold;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

构造方法

HashMap 的构造方法中没有对 table 进行初始化操作。table 的初始化操作是在 putVal() 方法进行的。

// 无参构造
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

// 指定初始容量
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

// 指定集合转化为 Map
public HashMap(Map<? extends K, ? extends V> m) {  
    this.loadFactor = DEFAULT_LOAD_FACTOR;  
    putMapEntries(m, false);  
}

// 指定初始容量和加载因子
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity:" + initialCapacity);                                       
                                           
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
        
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor:" + loadFactor);
    
    this.loadFactor = loadFactor;
    // tableSizeFor 方法很巧妙，下文详解
    this.threshold = tableSizeFor(initialCapacity);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

tableSizeFor 方法详解

// 返回一个大于 cap 的最小的 2 的 n 次幂，比如 cap=100，则返回 128。
static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
1
2
3
4
5
6
7
8
9
10

tableSizeFor() 方法的位运算很巧妙，通过五次 >>> 和 | 操作，将最高位的 1 后面的位数都变为 1，最后返还 n+1。
假设 cap 的值为 100。

int tableSizeFor(int cap) {
    int n = cap - 1;
    System.out.println(Integer.toBinaryString(n));
    n |= n >>> 1;
    System.out.println(Integer.toBinaryString(n));
    n |= n >>> 2;
    System.out.println(Integer.toBinaryString(n));
    n |= n >>> 4;
    System.out.println(Integer.toBinaryString(n));
    n |= n >>> 8;
    System.out.println(Integer.toBinaryString(n));
    n |= n >>> 16;
    System.out.println(Integer.toBinaryString(n));
    int res = n + 1;
    System.out.println("res =" + res);
    return res;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

输出：

put 方法解析

先校验数组是否已经初始化，若否，则初始化数组。
判断 key 是否已经存在，若已存在，则更新 value。若不存在则 new 一个结点插入到链表或红黑树中。如果是链表插入后，需要判断是否转成红黑树，默认是链表中的元素个数大于等于 8，并且数组的长度大于等于 64。
插入完成后需要判断是否需要扩容。

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
                   
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 校验数组是否已经初始化过了
        if ((tab = table) == null || (n = tab.length) == 0)
            // 初始化数组
            n = (tab = resize()).length;
        // i = (n - 1) & hash 确定元素要插入的位置（桶）里是否已经有结点了
        // 下文中详解 (n - 1) & hash 的巧妙
        if ((p = tab[i = (n - 1) & hash])== null)
            // 直接插入数组下标为 i 的位置
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // 判断桶里的第一个结点的 key 是否和参数 key 相同
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                // 若相同，保存该结点的引用
                e = p;
            else if (p instanceof TreeNode)
                // 桶里是红黑树
                // 遍历红黑树寻找结点的 key 和参数 key 相同的结点
                // 若没有则 new 一个结点并插入红黑树
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // 桶里是链表
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // TREEIFY_THRESHOLD - 1 = 7
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // 链表转红黑树
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e; // 同 p = p.next;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                // 留给 LinkedHashMap 的回调函数，会将当前被访问到的节点 e，移动至内部的双向链表的尾部。
                afterNodeAccess(e);
                return oldValue;
            }
        }
        
        ++modCount;
        
        // 判断是否需要扩容
        if (++size > threshold)
            // 扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

(n - 1) & hash 详解

为什么不是通过 hash % table.length 来确定桶的下标，如下代码：

int length = table.length();
int hash = key.hashCode()；
int index = hash % length;
1
2
3

原因有四点：

计算机中 & 的效率比 % 高很多
HashMap 中桶的数量必为 $2^{n}$ （这个通过 tableSizeFor(int cap) 方法可知）
当 length = $2^{n}$ 时，X % length 等效于 X & (length - 1)
key.hashCode() 有可能会溢出，导致 hash % length 的值是负的。

public class Test {
    public static void main(String[] args) {
        for (int i = 0; i < 10000; i++) {
            Random random = new Random();
            String a = random.nextInt() + "";
            System.out.println("a.hashCode() =" + a.hashCode());
        }
    }
}
1
2
3
4
5
6
7
8
9

截取部分输出：

a.hashCode() =827369019
a.hashCode() =1490427408
a.hashCode() =1438049413
a.hashCode() =-783450731
a.hashCode() =521164141
1
2
3
4
5

为什么求 hash 值时，需要 (h = key.hashCode())^ (h >>> 16)

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
1
2
3
4

为了将 hashcode 的高 16 位参与到求桶下标的运算中去，如果没有右移操作，hashcode 低位相同的 key 很可能会分到同一个桶中。举例说明：
没有 h >>> 16 的情况：
当 h = 1111 1111 1111 1111 1111 0000 1110 1010， n = 16 时，h&(n-1) = 10，即计算出来的下标是 10
当 h = 0000 0000 0000 0000 0000 0000 0000 1010，n = 16 时，h&(n-1) = 10，计算出来的下标还是 10

综上，不管 h 的高位是什么，只要最后四位是 1010，计算出来的下标都是 10。

加上 h >>> 16 的情况：
当 h3 = 1111 1111 1111 1111 1111 0000 1110 1010，h ^ (h >>> 16) = 1111 1111 1111 1111 0000 1111 0001 0101
n = 16 时，(h ^ (h >>> 16)) & (n-1) = 5，即计算出来的下标是 5。
所以当数组的长度比较小时，也能使高 16 位参与到 hash 值的计算中，同时不会有太大开销，减小了 hash 碰撞。
而且当 key 是 Integer 时且小于等于 65535 时，(h = key.hashCode())^ (h >>> 16) 和 key 的值是相等的，所以此时 key 是有序递增的。
当 key 大于 65535 以后，高 16 位就不全是 0 了，(h = key.hashCode())^ (h >>> 16) 的值就与 key 不相等了。

public class Test {
    public static void main(String[] args) {
        HashMap<Integer, Integer> m = new HashMap<>();
        m.put(65532,1);
        m.put(65533,2);
        m.put(65534,3);
        m.put(65535,4);
        for(Integer i : m.keySet()){
            Integer integer = m.get(i);
            System.out.println("integer =" + integer);
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13

integer = 1
integer = 2
integer = 3
integer = 4
1
2
3
4

public class Test {
    public static void main(String[] args) {
        HashMap<Integer, Integer> m = new HashMap<>();
        m.put(65536,1);
        m.put(65537,2);
        m.put(65538,3);
        m.put(65539,4);
        for(Integer i : m.keySet()){
            Integer integer = m.get(i);
            System.out.println("integer =" + integer);
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13

integer = 2
integer = 1
integer = 4
integer = 3
1
2
3
4

get 方法解析

根据 key 生成 hashcode
如果数组为空，则直接返回空
如果数组不为空，计算出 key 所对应的数组下标 i
如果数组的第 i 个位置上没有元素，则直接返回空
如果数组的第 i 个位上的元素的 key 等于 get 方法所传进来的 key，则返回该元素
如果不等于则判断该元素还有没有下一个元素，如果没有，返回空
如果有则判断该元素的类型是链表结点还是红黑树结点
a. 如果是链表则遍历链表
b. 如果是红黑树则遍历红黑树
找到即返回元素，没找到的则返回空

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;

        // 数组不为空且 key 所在的下标有值
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash])!= null) {
            // 判断链表第一个节点的 key 和参数 key 是否相等
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                // 判断是红黑树还是链表
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 遍历链表
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        
        // 数组为空直接返回 null
        return null;
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

相关阅读:
都 2022 年了，你真的会用 Python 的 pip 吗？
搜索引擎ElasticSearch详解
 Python技能树——进阶语法讲解（3）
【操作系统】：操作系统概述
 黑客帝国：随机字母生成器
 Java核心技术面试题(附答案)，纯手码，赶紧带走冲刺10月秋招
 LeetCode_dijkstra 算法_困难_882.细分图中的可到达节点
 java毕业设计宠物寄养管理系统Mybatis+系统+数据库+调试部署
 程序员过中秋 | 如何用代码绘制月亮？
实验三 ORI指令设计实验【计算机组成原理】
原文地址：https://blog.csdn.net/AlphaBr/article/details/126552940