List获取差集产生的问题

背景

在批量保存数据的时候，需要和数据库进行比对。如果当前数据存在则从List集合中删除。

List批量删除的话，一般都会考虑使用removeAll。

操作流程如下：

1、将现有批量数据的外部唯一字段和数据库中的数据进行匹配，返回现有存在的List对象

2、使用removeAll功能，批量删除，取得当前差集。

使用removeAll功能后，在批量保存的时候，还是直接导致了数据重复。

Api 说明

操作类型	方法	说明
交集	listA.retainAll(listB)	调用方法后ListA变为两个集合的交集，ListB不变
差集	listA.removeAll(listB)	调用方法后ListA变为两个集合的差集，ListB不变
并集	1.listA.removeAll(listB) 2.listA.addAll(listB)	去重，先取差集再并集。ListA变为两个集合的并集，ListB不变

removeAll源码

说啥都是虚的，翻removeAll源码牌子。


public boolean removeAll(Collection c) {
        return batchRemove(c, false, 0, size);
}
...
 
boolean batchRemove(Collection c, boolean complement,
                        final int from, final int end) {
        Objects.requireNonNull(c);
        final Object[] es = elementData;
        int r;
        // Optimize for initial run of survivors
        for (r = from;; r++) {
            if (r == end)
                return false;
            if (c.contains(es[r]) != complement)
                break;
        }
        int w = r++;
        try {
            for (Object e; r < end; r++)
                if (c.contains(e = es[r]) == complement)
                    es[w++] = e;
        } catch (Throwable ex) {
            // Preserve behavioral compatibility with AbstractCollection,
            // even if c.contains() throws.
            System.arraycopy(es, r, es, w, end - r);
            w += end - r;
            throw ex;
        } finally {
            modCount += end - w;
            shiftTailOverGap(es, w, end);
        }
        return true;
    }

我们可以看到，需要循环比较每个对象。


 for (r = from;; r++) {
            if (r == end)
                return false;
            if (c.contains(es[r]) != complement)
                break;
        }

从数据库查出来的数据，包含了ID等其它字段。这样两个对象的属性就不一样了。所以会返回false。这样就达不到去重的目的了。

自定义对象就可以使用下面JDK8+的Stream方式去去重了。

将已经在库的数据（exitList）和需要保存的数据（entityList）匹配，将不存在库里的挑出来放到新的List(saveDataEntityList)中。

伪代码如下：


saveDataEntityList = entityList.stream().filter(f -> !exitList.stream().map(Entity::getId).collect(Collectors.toList()).contains(f.getId())
).collect(Collectors.toList());

经过测试，发现OK，没有重复数据了。

总结

removeAll适合子集完全匹配和基础类型的操作，建议在自定义对象的时候，不要使用removeAll方法，而是使用stream的方式。

相关阅读:
Mybatis-Plus（核心功能篇 ==＞主键策略
cmake练习一
python中StringIO和BytesIO
实验八 T-sql，存储过程
【ARM 安全系列介绍 1 -- 奇偶校验与海明码校验详细介绍】
抖音小程序-小玩法(学习笔记)
Python基础语法(一)
PS进阶篇——如何PS软件给图片部分位置打马赛克（四）
Strongswan：gcrypt-ikev2/rw-cert测试浅析
C++栈、队列、优先级队列模拟+仿函数

原文地址：https://blog.csdn.net/m290345792/article/details/126030021