几乎每个后台管理系统都难逃excel导入和导出的功能,目前针对大数据量的导入和导出多数是使用EasyExcel,EasyExcel极大地简单了java对excel的读写操作,从pom的依赖上知道easyexcel依赖POI,那么EasyExcel是如何实现高性能的excel读写操作呢,实现上和POI有什么区别和联系呢?
要讲EasyExcel,必须先讲一下excel版本和POI。
| 版本 | 最大数据量 | 后缀 | 文档类型 | 解压后的文件结构 |
|---|---|---|---|---|
| 2003 | 65535 | .xls | 复合文档 |
|
| 2007 | 1048575 | .xlsx | XML文档 |
|
Excel2007及以上版本,其文件由多个xml文件打包组成,其存数据的文件位于xl/worksheets/下,一个sheet以一个xml文件的形成存在,而正是由于这个文件结构,java可凭借其丰富强大的xml解析类库,实现对xml文件的流式追加写入和流式读取,避免全量读入数据而内存溢出,使读取大批量数据成为可能。
可用于操作Microsoft Office的相关文件(可用于excel,word,ppt等)的一套程序库,apache出品。POI用于操作Excel的对象都是实现了Workbook接口的类.
Workboot是创建Excel对象或Sheet等的顶级对象,主要的子类有:E
| 实现类 | Excel版本 | 特点 |
|---|---|---|
| HSSFWorkbook | 2003及以前 | 对象常驻内存,易OOM |
| XSSFWorkbook | 2007及以后 | 对象常驻内存,易OOM |
| SXSSFWorkbook | 2007及以后 | 流式的XSSFWorkbook, 有效避免OOM,仅用于写 |
这三套实现,还对应着各自的家族成员
| 工作簿 | HSSFWorkbook | XSSFWorkbook | SXSSFWorkbook |
| Sheet | HSSFSheet | XSSFSheet | SXSSFSheet |
| 行 | HSSFRow | XSSFRow | SXSSFRow |
| 单元格 | HSSFCell | XSSFCell | SXSSFCell |
其中HSSFWorkbook和XSSFWorkbook对Excel的处理和解析是非常吃内存的,每个单元格都是一个 Cell对象,每个Cell对象又有CellStyle对象,想想1个10列*10W行,单单它的Cell对象就已经有100W个,如果需要全部加载进内存,无疑是程序的灾难,于是SXSSFWorkbook随之出现。
SXSSFWorkbook是streaming版本的XSSFWorkbook,它只会保存最新的excel rows(默认100行)在内存里供查看,在此之前的excel rows都会被写入到硬盘里。被写入到硬盘里的rows对程序而言是不可见的,详见:
- public SXSSFRow createRow(int rownum) {
- int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();
- if (rownum < 0 || rownum > maxrow) {
- throw new IllegalArgumentException("Invalid row number (" + rownum
- + ") outside allowable range (0.." + maxrow + ")");
- }
-
- // attempt to overwrite a row that is already flushed to disk
- if(rownum <= _writer.getLastFlushedRow() ) {
- throw new IllegalArgumentException(
- "Attempting to write a row["+rownum+"] " +
- "in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");
- }
-
- // attempt to overwrite a existing row in the input template
- if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) {
- throw new IllegalArgumentException(
- "Attempting to write a row["+rownum+"] " +
- "in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");
- }
-
- SXSSFRow newRow = new SXSSFRow(this);
- _rows.put(rownum, newRow);
- allFlushed = false;
- if(_randomAccessWindowSize >= 0 && _rows.size() > _randomAccessWindowSize) {
- try {
- //如果写入的行数>_randomAccessWindowSize(默认100),则将老的记录写入磁盘
- flushRows(_randomAccessWindowSize);
- } catch (IOException ioe) {
- throw new RuntimeException(ioe);
- }
- }
- return newRow;
- }
在创建新行的过程中,SXSSFSheet通过flushRows()将多于随机读取窗口大小的Row写入磁盘的临时文件中,同时从内存中移除,从而实现内存减负,避免OOM的出现。
原生POI,通过SXSSF系列API,可以实现100W数据无压力输出(在-Xmx20M下即可运行)
- import java.io.File;
- import java.io.FileOutputStream;
- import java.io.IOException;
- import java.lang.reflect.Field;
- import java.util.Collection;
-
- import org.apache.commons.beanutils.BeanUtils;
- import org.apache.poi.ss.usermodel.Row;
- import org.apache.poi.xssf.streaming.SXSSFSheet;
- import org.apache.poi.xssf.streaming.SXSSFWorkbook;
-
- import com.alibaba.excel.annotation.ExcelProperty;
-
- public class PoiWriter
{ -
- private SXSSFWorkbook workbook = null;
- private SXSSFSheet sheet = null;
- private Class
head = null; - private boolean headerInited = false;
- private int rowNum = 0;
-
- public PoiWriter(Class
head) { - this.workbook = new SXSSFWorkbook();
- this.sheet = workbook.createSheet("sheet0");
- this.head = head;
- }
-
- public void write(Collection
dataset) throws Exception { - initHeader();
- for (T object : dataset) {
- writeRow(object);
- }
- }
-
- private void writeRow(T object) throws Exception {
- Field[] fields = head.getDeclaredFields();
- Row row = sheet.createRow(rowNum ++);
- int index = 0;
- for (Field field : fields) {
- String value = BeanUtils.getProperty(object, field.getName());
- row.createCell(index ++).setCellValue(value);
- }
- }
-
- public void flush(String fileName) throws IOException {
- FileOutputStream fos = new FileOutputStream(new File(fileName));
- workbook.write(fos);
- fos.close();
- }
-
- private void initHeader() {
- if(!headerInited) {
- createHeader();
- headerInited = true;
- }
- }
-
- private void createHeader() {
- Field[] fields = head.getDeclaredFields();
- int index = 0;
- Row row = sheet.createRow(rowNum ++);
- for (Field field : fields) {
- ExcelProperty excelProperty = field.getAnnotation(ExcelProperty.class);
- String columnName = excelProperty != null ? excelProperty.value()[0] : field.getName();
- row.createCell(index ++).setCellValue(columnName);
- }
- }
-
- }
- public class PoiWriterTest {
-
- @Test
- public void write() throws Exception {
- PoiWriter
poiWriter = new PoiWriter<>(Student.class); - int pageSize = 1000;
- for (int pageNo = 1; pageNo <= 100; pageNo++) {
- poiWriter.write(pageQuery(pageNo, pageSize));
- }
- poiWriter.flush("D:/stu.xlsx");
- }
-
- //模拟分页查询
- private List
pageQuery(int pageNo, int pageSize) { - List
students = new ArrayList<>(); - for (int i = 0; i < pageSize; i++) {
- int index = (pageNo - 1) * pageSize + i;
- students.add(build(index));
- }
- return students;
- }
-
- private Student build(int index) {
- Student s = new Student();
- s.setName("学生" + index);
- s.setAddress("地址" + index);
- s.setAge(String.valueOf(RandomUtils.nextInt(20, 30)));
- s.setClazz("班别" + index % 8);
- s.setGrade("年级" + index % 6);
- s.setTel(String.valueOf(13500000000L + RandomUtils.nextInt(1, 10000000)));
- s.setChinese(String.valueOf(RandomUtils.nextInt(50, 100)));
- s.setEnglish(String.valueOf(RandomUtils.nextInt(50, 100)));
- return s;
- }
- }
- @Data
- public class Student {
-
- @ExcelProperty(value="英语")
- private String english;
- @ExcelProperty(value="姓名")
- private String name;
- @ExcelProperty(value="年龄")
- private String age;
- @ExcelProperty(value="学校")
- private String school;
- @ExcelProperty(value="年级")
- private String grade;
- @ExcelProperty(value="班别")
- private String clazz;
- @ExcelProperty(value="地址")
- private String address;
- @ExcelProperty(value="电话")
- private String tel;
- @ExcelProperty(value="语文")
- private String chinese;
- @ExcelProperty(value="数学")
- private String math;
-
- }
注意,这时的写到磁盘,只是当前用户的一个临时目录,是一个xml文件,里面记录着行的内容,格式化后形如:
- <row r="1">
- <c r="A1" s="1" t="inlineStr">
- <is>
- <t>内容1t>
- is>
- c>
- <c r="B1" s="1" t="inlineStr">
- <is>
- <t>内容2t>
- is>
- c>
- row>
SXSSFWorkbook只实现了流式写,并没有实现流式读,流式读取excel需要通过Sax模式的事件API进行实现,该核心实现如下:
POI版Excel解析(DEMO)
- public class PoiSaxReader {
-
- public void read(String path, AbstractSheetHandler> handler) throws Exception {
- // 1.根据excel报表获取OPCPackage
- OPCPackage opcPackage = OPCPackage.open(path, PackageAccess.READ);
- // 2.创建XSSFReader
- XSSFReader reader = new XSSFReader(opcPackage);
- // 3.获取SharedStringTable对象
- SharedStringsTable table = reader.getSharedStringsTable();
- // 4.获取styleTable对象
- StylesTable stylesTable = reader.getStylesTable();
- // 5.创建Sax的xmlReader对象
- XMLReader xmlReader = XMLReaderFactory.createXMLReader();
- // 6.注册事件处理器
- XSSFSheetXMLHandler xmlHandler = new XSSFSheetXMLHandler(stylesTable, table, handler, false);
- xmlReader.setContentHandler(xmlHandler);
- // 7.第一个sheet的流数据
- XSSFReader.SheetIterator sheetIterator = (XSSFReader.SheetIterator) reader.getSheetsData();
- InputStream stream = sheetIterator.next();
- InputSource is = new InputSource(stream);
- // 8.最关键:执行xml解析,解析过程中会回调XSSFSheetXMLHandler的startElement等方法
- xmlReader.parse(is);
- }
-
- static class StudentHandler extends AbstractSheetHandler
{ -
- private int total = 0;
-
- @Override
- protected void invoke(Student object) {
- total += 1;
- if(total % 1000 == 0) {
- System.out.println("读取到:"+total);
- System.out.println(object);
- }
- }
-
- }
-
- public static void main(String[] args) throws Exception {
- PoiSaxReader reader = new PoiSaxReader();
- reader.read("D:/stu_100W.xlsx", new StudentHandler());
- }
-
- }
- /**
- * 基于Sax的解析处理器抽象
- */
- public abstract class AbstractSheetHandler
implements XSSFSheetXMLHandler.SheetContentsHandler { -
- // 将输出到excel的对象
- private T object;
- private Class
clazz; -
- @SuppressWarnings("unchecked")
- public AbstractSheetHandler() {
- Type genType = getClass().getGenericSuperclass();
- Type[] params = ((ParameterizedType) genType).getActualTypeArguments();
- this.clazz = (Class
) params[0]; - }
-
- /**
- * 当开始解析某一行的时候触发
- */
- @Override
- public void startRow(int i) {
- try {
- object = clazz.newInstance();
- } catch (InstantiationException | IllegalAccessException e) {
- e.printStackTrace();
- }
- }
-
- /**
- * 当结束解析某一行的时候触发
- */
- @Override
- public void endRow(int i) {
- invoke(object);
- }
-
-
- protected abstract void invoke(T object);
- /**
- * cellReference:单元格引用,形如A1,B1
- */
- @Override
- public void cell(String cellReference, String value, XSSFComment xssfComment) {
- Field[] fields = clazz.getDeclaredFields();
- int columnIndex = new CellReference(cellReference).getCol();
- try {
- BeanUtils.setProperty(object, fields[columnIndex].getName(), value);
- } catch (IllegalAccessException | InvocationTargetException e) {
- e.printStackTrace();
- }
- }
- }
以上程序,同样在-Xmx20M下可以成功运行。该程序只是孤立地对每一条数据进行处理,真实情况下程序可能需要拿到所有行的完整视图,如果需要将所有对象放进一个List中,那所消耗的内存当然远不止20M,但这种只能说是业务消耗的内存,而非POI消耗的内存。
上面用到的一些类可能都相对陌生,其核心逻辑就是解压xlsx文件,然后通过sax事件驱动模式对excel数据(即xml)进行逐行解释。这里重要的一个点是XSSFSheetXMLHandler
该类继承自DefaultHandler,XMLReader执行解析的过程中,会回调其startElement/endElement/characters等方法,后面说到的easyexcel,其解析过程是easyexcel自己搞一个XlsxRowHandler。

原生的POI虽然可以实现大数据量的导入和导出,但对于业务开发人员来说,使用起来还是没那么友好,特别是在处理大数据量导入的时候,EasyExcel基于POI,主要解决易用度的问题,同时也对POI使用Sax事件API进行数据行解释那一块进行了重写,减少了内存消耗。见EasyExcel官网的解释
Java解析、生成Excel比较有名的框架有Apache poi、jxl。但他们都存在一个严重的问题就是非常的耗内存,poi有一套SAX模式的API可以一定程度的解决一些内存溢出的问题,但POI还是有一些缺陷,比如07版Excel解压缩以及解压后存储都是在内存中完成的,内存消耗依然很大。easyexcel重写了poi对07版Excel的解析,一个3M的excel用POI sax解析依然需要100M左右内存,改用easyexcel可以降低到几M,并且再大的excel也不会出现内存溢出;03版依赖POI的sax模式,在上层做了模型转换的封装,让使用者更加简单方便
- @Test
- public void read() {
- EasyExcel.read("D:/stu_100W.xlsx", Student.class, new PageReadListener
(dataList -> { - log.info("读取到:{}", dataList.size());
- })).sheet().doRead();
- }
EasyExcel的API确实很简洁,一波debug跟进,来到XlsxSaxAnalyser#execute
XlsxSaxAnalyser#execute
- @Override
- public void execute() {
- for (ReadSheet readSheet : sheetList) {
- readSheet = SheetUtils.match(readSheet, xlsxReadContext);
- if (readSheet != null) {
- xlsxReadContext.currentSheet(readSheet);
- // 注意这个 XlsxRowHandler
- parseXmlSource(sheetMap.get(readSheet.getSheetNo()), new XlsxRowHandler(xlsxReadContext));
- readComments(readSheet);
- xlsxReadContext.analysisEventProcessor().endSheet(xlsxReadContext);
- }
- }
- }
-
- private void parseXmlSource(InputStream inputStream, ContentHandler handler) {
- InputSource inputSource = new InputSource(inputStream);
- try {
- SAXParserFactory saxFactory;
- String xlsxSAXParserFactoryName = xlsxReadContext.xlsxReadWorkbookHolder().getSaxParserFactoryName();
- if (StringUtils.isEmpty(xlsxSAXParserFactoryName)) {
- saxFactory = SAXParserFactory.newInstance();
- } else {
- saxFactory = SAXParserFactory.newInstance(xlsxSAXParserFactoryName, null);
- }
- try {
- saxFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
- } catch (Throwable ignore) {}
- try {
- saxFactory.setFeature("http://xml.org/sax/features/external-general-entities", false);
- } catch (Throwable ignore) {}
- try {
- saxFactory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
- } catch (Throwable ignore) {}
- SAXParser saxParser = saxFactory.newSAXParser();
- XMLReader xmlReader = saxParser.getXMLReader();
- xmlReader.setContentHandler(handler);
- //和POI一样的套路
- xmlReader.parse(inputSource);
- inputStream.close();
- } catch (IOException | ParserConfigurationException | SAXException e) {
- throw new ExcelAnalysisException(e);
- } finally {
- if (inputStream != null) {
- try {
- inputStream.close();
- } catch (IOException e) {
- throw new ExcelAnalysisException("Can not close 'inputStream'!");
- }
- }
- }
- }
XlsxRowHandler
- public class XlsxRowHandler extends DefaultHandler {
- private final XlsxReadContext xlsxReadContext;
- private static final Map
XLSX_CELL_HANDLER_MAP = new HashMap(32); -
- static {
- CellFormulaTagHandler cellFormulaTagHandler = new CellFormulaTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_FORMULA_TAG, cellFormulaTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_FORMULA_TAG, cellFormulaTagHandler);
- CellInlineStringValueTagHandler cellInlineStringValueTagHandler = new CellInlineStringValueTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_INLINE_STRING_VALUE_TAG, cellInlineStringValueTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_INLINE_STRING_VALUE_TAG, cellInlineStringValueTagHandler);
- CellTagHandler cellTagHandler = new CellTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_TAG, cellTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_TAG, cellTagHandler);
- CellValueTagHandler cellValueTagHandler = new CellValueTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_VALUE_TAG, cellValueTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_VALUE_TAG, cellValueTagHandler);
- CountTagHandler countTagHandler = new CountTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.DIMENSION_TAG, countTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_DIMENSION_TAG, countTagHandler);
- HyperlinkTagHandler hyperlinkTagHandler = new HyperlinkTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.HYPERLINK_TAG, hyperlinkTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_HYPERLINK_TAG, hyperlinkTagHandler);
- MergeCellTagHandler mergeCellTagHandler = new MergeCellTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.MERGE_CELL_TAG, mergeCellTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_MERGE_CELL_TAG, mergeCellTagHandler);
- RowTagHandler rowTagHandler = new RowTagHandler();
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.ROW_TAG, rowTagHandler);
- XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_ROW_TAG, rowTagHandler);
- }
-
- public XlsxRowHandler(XlsxReadContext xlsxReadContext) {
- this.xlsxReadContext = xlsxReadContext;
- }
-
- @Override
- public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
- XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(name);
- if (handler == null || !handler.support(xlsxReadContext)) {
- return;
- }
- xlsxReadContext.xlsxReadSheetHolder().getTagDeque().push(name);
- handler.startElement(xlsxReadContext, name, attributes);
- }
-
- @Override
- public void characters(char[] ch, int start, int length) throws SAXException {
- String currentTag = xlsxReadContext.xlsxReadSheetHolder().getTagDeque().peek();
- if (currentTag == null) {
- return;
- }
- XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(currentTag);
- if (handler == null || !handler.support(xlsxReadContext)) {
- return;
- }
- handler.characters(xlsxReadContext, ch, start, length);
- }
-
- @Override
- public void endElement(String uri, String localName, String name) throws SAXException {
- XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(name);
- if (handler == null || !handler.support(xlsxReadContext)) {
- return;
- }
- handler.endElement(xlsxReadContext, name);
- xlsxReadContext.xlsxReadSheetHolder().getTagDeque().pop();
- }
-
- }
相比于XSSFSheetXMLHandler,XlsxRowHandler维护一个XlsxTagHandler的集合,XlsxRowHandler充当一个门面,解析逻辑交给各个具体的XlsxTagHandler
XlsxTagHandler
easyexcel实现了如下的XlsxTagHandler:
其中CellTagHandler解析
- @Test
- public void export() {
- ExcelWriter excelWriter = EasyExcel.write("D:/student_100W.xlsx", Student.class).build();
- WriteSheet sheet0 = EasyExcel.writerSheet("sheet0").build();
- for (int pageNo = 1; pageNo <= 2; pageNo++) {
- excelWriter.write(studentService.pageQuery(pageNo, pageSize), sheet0);
- }
- excelWriter.finish();
- }
EasyExcel的POI进行了良好的封装,简单易用,API在使用上很符合大家使用日常操作excel的习惯:

得益于POI的SXSSFSheet.createRow时,会进行自动将老的Row进行刷盘操作,EasyExcel只在SXSSF家族上面做一个封装,即可实现大数据量的写入操作。主要代码见:
ExcelWriteAddExecutor#add
- public void add(Collection> data) {
- if (CollectionUtils.isEmpty(data)) {
- data = new ArrayList<>();
- }
- WriteSheetHolder writeSheetHolder = writeContext.writeSheetHolder();
- int newRowIndex = writeSheetHolder.getNewRowIndexAndStartDoWrite();
- if (writeSheetHolder.isNew() && !writeSheetHolder.getExcelWriteHeadProperty().hasHead()) {
- newRowIndex += writeContext.currentWriteHolder().relativeHeadRowIndex();
- }
- // BeanMap is out of order, so use sortedAllFieldMap
- Map
sortedAllFieldMap = new TreeMap<>(); - int relativeRowIndex = 0;
- for (Object oneRowData : data) {
- int lastRowIndex = relativeRowIndex + newRowIndex;
- addOneRowOfDataToExcel(oneRowData, lastRowIndex, relativeRowIndex, sortedAllFieldMap);
- relativeRowIndex++;
- }
- }
-
- //向Excel写入一行
- private void addOneRowOfDataToExcel(Object oneRowData, int rowIndex, int relativeRowIndex,
- Map
sortedAllFieldMap) { - if (oneRowData == null) {
- return;
- }
- RowWriteHandlerContext rowWriteHandlerContext = WriteHandlerUtils.createRowWriteHandlerContext(writeContext,
- rowIndex, relativeRowIndex, Boolean.FALSE);
- WriteHandlerUtils.beforeRowCreate(rowWriteHandlerContext);
- //这里调用:SXSSFSheet#createRow
- Row row = WorkBookUtil.createRow(writeContext.writeSheetHolder().getSheet(), rowIndex);
- rowWriteHandlerContext.setRow(row);
-
- WriteHandlerUtils.afterRowCreate(rowWriteHandlerContext);
-
- if (oneRowData instanceof Collection>) {
- addBasicTypeToExcel(new CollectionRowData((Collection>)oneRowData), row, rowIndex, relativeRowIndex);
- } else if (oneRowData instanceof Map) {
- addBasicTypeToExcel(new MapRowData((Map
)oneRowData), row, rowIndex, relativeRowIndex); - } else {
- addJavaObjectToExcel(oneRowData, row, rowIndex, relativeRowIndex, sortedAllFieldMap);
- }
-
- WriteHandlerUtils.afterRowDispose(rowWriteHandlerContext);
- }
总结:
1.EasyExcel的写,借助于SXSSFSheet.createRow解决导出时的内存消耗问题
2.EasyExcel的读,自己实现了DefaultHandler,对Excel文件包中xml数据文件进行流式解释,解决读时的内存消耗问题。
by simple