• java POI解析Excel大文件,获取表头


    前言

    poi解析大文件可能出现oom,通样大小文件,xlsx会oom,xls不会,所以使用流式的方式改造解析xlsx文件的代码。
    我的需求是提取每一页的表头,所以值提取第一行,如果需要全部内容,把行数循环地方的if改成while就行。
    产生Workbook给的入参都是给的File,没用Inputstream,原因见后两章

    依赖

    可能不全,按需引入更多poi相关依赖

    		<dependency>
                <groupId>org.apache.poigroupId>
                <artifactId>poiartifactId>
                <version>4.1.2version>
            dependency>
            <dependency>
                <groupId>com.monitorjblgroupId>
                <artifactId>xlsx-streamerartifactId>
                <version>2.2.0version>
            dependency>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    代码

    //newExcel true表示xlsx,false表示xls
    private static String excel(File file, boolean newExcel) throws Exception {
            StringBuilder sb = new StringBuilder();
            Workbook workbook = null;
            try {
                if (newExcel) {
                	//我只取第一行,你们按需调大
                    workbook = StreamingReader.builder()
                            .rowCacheSize(1)
                            .bufferSize(10240)
                            .open(file);
                } else {
                	//xls的还是普通方式
                    workbook = WorkbookFactory.create(file);
                }
                getSheet(sb, workbook);
            } finally {
                if (workbook != null) {
                    workbook.close();
                }
            }
            return sb.toString();
        }
    
        private static void getSheet(StringBuilder sb, Workbook workbook) throws IOException {
            try {
                sb.append(LEFT_MIDDLE_BRACKET);
                Iterator<Sheet> sheetIterator = workbook.iterator();
                while (sheetIterator.hasNext()) {
                    Sheet sheet = sheetIterator.next();
                    String sheetName = sheet.getSheetName();
                    if (sheetName.startsWith(BTXA_MASK)) {
                        continue;
                    }
                    sb.append(sheetName).append(C_COLON);
                    Iterator<Row> rowIterator = sheet.iterator();
                    //想取所有行的人把if改成while
                    if (rowIterator.hasNext()) {
                        Row row = rowIterator.next();
                        firstLine(sb, row);
                        if (sheetIterator.hasNext()) {
                            sb.append(C_SEMICOLON);
                        }
                    }
                }
                sb.append(RIGHT_MIDDLE_BRACKET);
            } finally {
                workbook.close();
            }
        }
    
        private static void firstLine(StringBuilder sb, Row row) {
            try {
                int numOfColumns = row.getLastCellNum();
                for (int j = 0; j < numOfColumns; j++) {
                    Cell cell = row.getCell(j, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
                    sb.append(getCell(cell));
                    if (j != numOfColumns - 1) {
                        sb.append(C_COMMA);
                    }
                }
            } catch (Exception e) {
                log.error("异常列", e);
                sb.append("异常列");
            }
        }
    
        private static String getCell(Cell cell) {
            // 获取单元格类型
            CellType cellType = cell.getCellType();
            // 根据单元格类型处理值
            String cellValue;
            switch (cellType) {
                case STRING:
                    cellValue = cell.getStringCellValue();
                    break;
                case NUMERIC:
                    cellValue = String.valueOf(cell.getNumericCellValue());
                    break;
                case BOOLEAN:
                    cellValue = String.valueOf(cell.getBooleanCellValue());
                    break;
                case FORMULA:
                    try {
                        DataFormatter formatter = new DataFormatter();
                        cellValue = formatter.formatCellValue(cell);
                    } catch (IllegalStateException | NumberFormatException e) {
                        cellValue = cell.getCellFormula();
                    }
                    break;
                case BLANK:
                    // 空单元格
                    cellValue = "";
                    break;
                case ERROR:
                    // 错误值
                    cellValue = "ERROR (" + ErrorEval.getText(cell.getErrorCellValue()) + ")";
                    break;
                default:
                    cellValue = "特殊类型";
            }
            return cellValue;
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103

    StreamingReader的open

    open方法使用InputStream还需要创建临时文件,如果有本地文件,直接给file

        /**
         * Reads a given {@code InputStream} and returns a new
         * instance of {@code Workbook}. Due to Apache POI
         * limitations, a temporary file must be written in order
         * to create a streaming iterator. This process will use
         * the same buffer size as specified in {@link #bufferSize(int)}.
         *
         * @param is input stream to read in
         * @return A {@link Workbook} that can be read from
         * @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the stream
         */
        public Workbook open(InputStream is) {
          StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);
          workbook.init(is);
          return new StreamingWorkbook(workbook);
        }
    
        /**
         * Reads a given {@code File} and returns a new instance
         * of {@code Workbook}.
         *
         * @param file file to read in
         * @return built streaming reader instance
         * @throws com.monitorjbl.xlsx.exceptions.OpenException if there is an issue opening the file
         * @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the file
         */
        public Workbook open(File file) {
          StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);
          workbook.init(file);
          return new StreamingWorkbook(workbook);
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    WorkbookFactory的create

    create方法说明InputStream比file更占内存

        /**
         * Creates the appropriate HSSFWorkbook / XSSFWorkbook from
         *  the given InputStream.
         *
         * 

    Your input stream MUST either support mark/reset, or * be wrapped as a {@link BufferedInputStream}! * Note that using an {@link InputStream} has a higher memory footprint * than using a {@link File}.

    * *

    Note that in order to properly release resources the * Workbook should be closed after use. Note also that loading * from an InputStream requires more memory than loading * from a File, so prefer {@link #create(File)} where possible. * * @param inp The {@link InputStream} to read data from. * * @return The created Workbook * * @throws IOException if an error occurs while reading the data * @throws EncryptedDocumentException If the Workbook given is password protected */ public static Workbook create(InputStream inp) throws IOException, EncryptedDocumentException { return create(inp, null); } /** * Creates the appropriate HSSFWorkbook / XSSFWorkbook from * the given File, which must exist and be readable. *

    Note that in order to properly release resources the * Workbook should be closed after use. * * @param file The file to read data from. * * @return The created Workbook * * @throws IOException if an error occurs while reading the data * @throws EncryptedDocumentException If the Workbook given is password protected */ public static Workbook create(File file) throws IOException, EncryptedDocumentException { return create(file, null); }

    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    CSV解析首行

    赠送一个csv方法

        private static String csv(File file) throws Exception {
            String encode = KkFileUtils.getFileEncode(file);
            log.info("编码检测:{}", encode);
            if ("Big5".equalsIgnoreCase(encode)) {
                encode = "GB2312";
            }
            String res = "";
            try (FileInputStream fo = new FileInputStream(file);
                 InputStreamReader isr = new InputStreamReader(fo, encode);
                 BufferedReader br = new BufferedReader(isr)) {
                String line = br.readLine();
                if (StringUtils.isNotBlank(line)) {
                    res = line;
                }
            }
            return res;
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
  • 相关阅读:
    解密数仓高可用failover流程
    python+ipc+改造过的插线板写一个控制ipc疯狂上下电的脚本
    (八)DDR_PHY架构及功能——(PUB组成、初始化及Training流程、Clock关系)
    Paper: 利用RNN来提取恶意软件家族的API调用模式
    红队隧道加密之OpenSSL加密反弹Shell(一)
    nordic的nrf52系列32M速率的SPI-SPIM3
    java毕业设计《EE开发技术》课程学习网站Mybatis+系统+数据库+调试部署
    Linux中线程的介绍
    90%的程序员不适合做独立开发
    npm报错sass
  • 原文地址:https://blog.csdn.net/weixin_43859729/article/details/138199306