神奇的MappedByteBuffer

MappedByteBuffer
性能分析
总结
补充: MappedByteBuffer的释放

Java提供的MappedByteBuffer底层实现靠的是mmap技术，当然这里指的是Linux平台，因此建议大家先了解一下mmap在Linux上的实现原理，然后在来阅读本篇文章:

MappedByteBuffer

MappedByteBuffer就是对mmap的一个java版本封装，在Linux平台，MappedByteBuffer底层靠的就是mmap进行实现的。

从继承结构上看，MappedByteBuffer继承自ByteBuffer，内部维护了一个逻辑地址address。

public abstract class MappedByteBuffer
    extends ByteBuffer
1
2

它实际上是一个抽象类，具体的实现有两个：

class DirectByteBuffer extends MappedByteBuffer implements DirectBuffer
1

class DirectByteBufferR extends DirectByteBuffer implements DirectBuffer
1

R代表的是ReadOnly的意思。

我们可以从RandomAccessFile的FilChannel中调用map方法获得它的实例。

我们看下map方法的定义：

public abstract MappedByteBuffer map(MapMode mode, long position, long size)
        throws IOException;
1
2

MapMode mode：内存映像文件访问的方式，共三种：

MapMode.READ_ONLY：只读，试图修改得到的缓冲区将导致抛出异常。
MapMode.READ_WRITE：读/写，对得到的缓冲区的更改最终将写入文件；但该更改对映射到同一文件的其他程序不一定是可见的。
MapMode.PRIVATE：表示copy-on-write模式，这个模式和READ_ONLY有点相似，它的操作是先对原数据进行拷贝，然后可以在拷贝之后的Buffer中进行读写。但是这个写入并不会影响原数据。可以看做是数据的本地拷贝，所以叫做Private。

position：文件映射时的起始位置。

allocationGranularity：Memory allocation size for mapping buffers，通过native函数initIDs初始化。

MappedByteBuffer的最大值

既然可以映射到虚拟内存空间，那么这个MappedByteBuffer是不是可以无限大？

当然不是了，首先虚拟地址空间的大小是有限制的，如果是32位的CPU，那么一个指针占用的地址就是4个字节，那么能够表示的最大值是0xFFFFFFFF，也就是4G。

另外我们看下map方法中size的类型是long，在java中long能够表示的最大值是0x7fffffff，也就是2147483647字节，换算一下大概是2G。也就是说MappedByteBuffer的最大值是2G，一次最多只能map 2G的数据。

MappedByteBuffer的使用

这里举两个使用案例:

    public static void readWithMap() throws IOException {

        try (RandomAccessFile file = new RandomAccessFile(new File("a.txt"), "r"))
        {
            //get Channel
            FileChannel fileChannel = file.getChannel();
            //get mappedByteBuffer from fileChannel
            MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
            // check buffer
            log.info("is Loaded in physical memory: {}",buffer.isLoaded());  //只是一个提醒而不是guarantee
            log.info("capacity {}",buffer.capacity());
            //read the buffer
            for (int i = 0; i < buffer.limit(); i++)
            {
                log.info("get {}", buffer.get());
            }
        }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

    public static void writeWithMap() throws IOException {
        try (RandomAccessFile file = new RandomAccessFile(new File("a.txt"), "rw"))
        {
            //get Channel
            FileChannel fileChannel = file.getChannel();
            //get mappedByteBuffer from fileChannel
            MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, 4096 * 8 );
            // check buffer
            log.info("is Loaded in physical memory: {}",buffer.isLoaded());  //只是一个提醒而不是guarantee
            log.info("capacity {}",buffer.capacity());
            //write the content
            buffer.put("dhy".getBytes());
        }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14

注意

MappedByteBuffer因为使用了内存映射，所以读写的速度都会有所提升。那么我们在使用中应该注意哪些问题呢？

MappedByteBuffer是没有close方法的，即使它的FileChannel被close了，MappedByteBuffer仍然处于打开状态，只有JVM进行垃圾回收的时候才会被关闭。而这个时间是不确定的。

内部实现

接下去通过分析源码，了解一下map过程的内部实现。

通过RandomAccessFile获取FileChannel。

public final FileChannel getChannel() {
    synchronized (this) {
        if (channel == null) {
            channel = FileChannelImpl.open(fd, path, true, rw, this);
        }
        return channel;
    }
}
1
2
3
4
5
6
7
8

上述实现可以看出，由于synchronized ，只有一个线程能够初始化FileChannel。

通过FileChannel.map方法，把文件映射到虚拟内存，并返回逻辑地址address，实现如下：

**只保留了核心代码**
public MappedByteBuffer map(MapMode mode, long position, long size)  throws IOException {
        int pagePosition = (int)(position % allocationGranularity);
        long mapPosition = position - pagePosition;
        long mapSize = size + pagePosition;
        try {
            addr = map0(imode, mapPosition, mapSize);
        } catch (OutOfMemoryError x) {
            System.gc();
            try {
                Thread.sleep(100);
            } catch (InterruptedException y) {
                Thread.currentThread().interrupt();
            }
            try {
                addr = map0(imode, mapPosition, mapSize);
            } catch (OutOfMemoryError y) {
                // After a second OOME, fail
                throw new IOException("Map failed", y);
            }
        }
        int isize = (int)size;
        Unmapper um = new Unmapper(addr, mapSize, isize, mfd);
        if ((!writable) || (imode == MAP_RO)) {
            return Util.newMappedByteBufferR(isize,
                                             addr + pagePosition,
                                             mfd,
                                             um);
        } else {
            return Util.newMappedByteBuffer(isize,
                                            addr + pagePosition,
                                            mfd,
                                            um);
        }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

上述代码可以看出，最终map通过native函数map0完成文件的映射工作。

如果第一次文件映射导致OOM，则手动触发垃圾回收，休眠100ms后再次尝试映射，如果失败，则抛出异常。
通过newMappedByteBuffer方法初始化MappedByteBuffer实例，不过其最终返回的是DirectByteBuffer的实例，实现如下：

static MappedByteBuffer newMappedByteBuffer(int size, long addr, FileDescriptor fd, Runnable unmapper) {
    MappedByteBuffer dbb;
    if (directByteBufferConstructor == null)
        initDBBConstructor();
    dbb = (MappedByteBuffer)directByteBufferConstructor.newInstance(
          new Object[] { new Integer(size),
                         new Long(addr),
                         fd,
                         unmapper }
    return dbb;
}
// 访问权限
private static void initDBBConstructor() {
    AccessController.doPrivileged(new PrivilegedAction<Void>() {
        public Void run() {
            Class<?> cl = Class.forName("java.nio.DirectByteBuffer");
                Constructor<?> ctor = cl.getDeclaredConstructor(
                    new Class<?>[] { int.class,
                                     long.class,
                                     FileDescriptor.class,
                                     Runnable.class });
                ctor.setAccessible(true);
                directByteBufferConstructor = ctor;
        }});
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

由于FileChannelImpl和DirectByteBuffer不在同一个包中，所以有权限访问问题，通过AccessController类获取DirectByteBuffer的构造器进行实例化。

DirectByteBuffer是MappedByteBuffer的一个子类，其实现了对内存的直接操作。

get过程

MappedByteBuffer的get方法最终通过DirectByteBuffer.get方法实现的。

public byte get() {
    return ((unsafe.getByte(ix(nextGetIndex()))));
}
public byte get(int i) {
    return ((unsafe.getByte(ix(checkIndex(i)))));
}
private long ix(int i) {
    return address + (i << 0);
}
1
2
3
4
5
6
7
8
9

map0()函数返回一个地址address，这样就无需调用read或write方法对文件进行读写，通过address就能够操作文件。底层采用unsafe.getByte方法，通过（address + 偏移量）获取指定内存的数据。

第一次访问address所指向的内存区域，导致缺页中断，中断响应函数会在交换区中查找相对应的页面，如果找不到（也就是该文件从来没有被读入内存的情况），则从硬盘上将文件指定页读取到物理内存中（非jvm堆内存）。
如果在拷贝数据时，发现物理内存不够用，则会通过虚拟内存机制（swap）将暂时不用的物理页面交换到硬盘的虚拟内存中。

性能分析

从代码层面上看，从硬盘上将文件读入内存，都要经过文件系统进行数据拷贝，并且数据拷贝操作是由文件系统和硬件驱动实现的，理论上来说，拷贝数据的效率是一样的。
但是通过内存映射的方法访问硬盘上的文件，效率要比read和write系统调用高，这是为什么？

read()是系统调用，首先将文件从硬盘拷贝到内核空间的一个缓冲区，再将这些数据拷贝到用户空间，实际上进行了两次数据拷贝；
map()也是系统调用，但没有进行数据拷贝，当缺页中断发生时，直接将文件从硬盘拷贝到用户空间，只进行了一次数据拷贝。

所以，采用内存映射的读写效率要比传统的read/write性能高。

总结

MappedByteBuffer使用虚拟内存，因此分配(map)的内存大小不受JVM的-Xmx参数限制，但是也是有大小限制的。
如果当文件超出1.5G限制时，可以通过position参数重新map文件后面的内容。
MappedByteBuffer在处理大文件时的确性能很高，但也存在一些问题，如内存占用、文件关闭不确定，被其打开的文件只有在垃圾回收的才会被关闭，而且这个时间点是不确定的。
javadoc中也提到：A mapped byte buffer and the file mapping that it represents remain* valid until the buffer itself is garbage-collected.

补充: MappedByteBuffer的释放

由于FileChannel调用了map方法做内存映射，但是没提供对应的unmap方法释放内存，导致内存一直占用该文件。实际unmap方法在FileChannelImpl中私有方法中，在finalize时，unmap无法调用导致内存没释放。

解决办法有如下两个:

手动执行unmap方法

// 在关闭资源时执行以下代码释放内存
Method m = FileChannelImpl.class.getDeclaredMethod("unmap", MappedByteBuffer.class);
m.setAccessible(true);
m.invoke(FileChannelImpl.class, buffer);
1
2
3
4

让MappedByteBuffer自己释放本身持有的内存

AccessController.doPrivileged(new PrivilegedAction() {
    public Object run() {
      try {
        Method getCleanerMethod = buffer.getClass().getMethod("cleaner", new Class[0]);
        getCleanerMethod.setAccessible(true);
        sun.misc.Cleaner cleaner = (sun.misc.Cleaner)
        getCleanerMethod.invoke(byteBuffer, new Object[0]);
        cleaner.clean();
      } catch (Exception e) {
        e.printStackTrace();
      }
      return null;
    }
});
1
2
3
4
5
6
7
8
9
10
11
12
13
14

实际上面两个方法都调用了Cleaner类的clean方法释放，参考unmap代码

private static void unmap(MappedByteBuffer bb) {
    Cleaner cl = ((DirectBuffer)bb).cleaner();
    if (cl != null)
        cl.clean();
}
1
2
3
4
5

其实讲到这里该问题的解决办法已然清晰明了了。就是在删除索引文件的同时还取消对应的内存映射，删除mapped对象。不过令人遗憾的是，Java并没有特别好的解决方案——令人有些惊讶的是，Java没有为MappedByteBuffer提供unmap的方法，该方法甚至要等到Java 10才会被引入 ,DirectByteBufferR类是不是一个公有类 class DirectByteBufferR extends DirectByteBuffer implements DirectBuffer 使用默认访问修饰符不过Java倒是提供了内部的“临时”解决方案——DirectByteBufferR.cleaner().clean() 切记这只是临时方法，毕竟该类在Java9中就正式被隐藏了，而且也不是所有JVM厂商都有这个类。还有一个解决办法就是显式调用System.gc()，让gc赶在cache失效前就进行回收。不过坦率地说，这个方法弊端更多：首先显式调用GC是强烈不被推荐使用的，其次很多生产环境甚至禁用了显式GC调用，所以这个办法最终没有被当做这个bug的解决方案。

相关阅读:
DNS域名解析服务
 git主干master分支回滚到历史版本(不会有错误的提交记录)
蓝/紫/红色光油溶性钙钛矿/近红外发射光油溶性PbS/CdS量子点制备
 Umi3实战教程
 如何用python做简单的接口压力测试
 java之Number与Math及Random类
 AI 模型社区“魔搭”亮相，平头哥又上新，端云一体生态再升级
 腾讯地图开发填坑总结
 [SQL开发笔记]UPDATE 语句：更新表中的记录
 使用docker搭建ELK进行日志收集
原文地址：https://blog.csdn.net/m0_53157173/article/details/127584591