• 解析字符串后分组写成多文件


    【问题】

    So I have one large file that contains a bunch of weather data. I have to allocate each line from the large file into its corresponding state file. So there will be a total of 50 new state files with their own data.

    The large file contains ~1 million lines of records like this:

    COOP:166657,'NEW IBERIA AIRPORT ACADIANA REGIONAL LA US',200001,177,553

    Although the name of the station can vary and have different number of words.

    Currently right now I am using regex to find the pattern and output to a file, and it must be grouped by state. If I read in the entire file without any modifications it takes about 46 seconds. With the code to find the state abbreviation, create the file, and output to that file, it takes over 10 minutes.

    This is what I have right now:

    1. package climate;
    2. import java.io.BufferedReader;
    3. import java.io.File;
    4. import java.io.FileReader;
    5. import java.io.FileWriter;
    6. import java.io.IOException;
    7. import java.util.Arrays;
    8. import java.util.Scanner;
    9. import java.util.regex.Matcher;
    10. import java.util.regex.Pattern;
    11. /**
    12. \* This program will read in a large file containing many stations and states,
    13. \* and output in order the stations to their corresponding state file.
    14. *
    15. \* Note: This take a long time depending on processor. It also appends data to
    16. \* the files so you must remove all the state files in the current directory
    17. \* before running for accuracy.
    18. *
    19. \* @author Marcus
    20. *
    21. */
    22. public class ClimateCleanStates {
    23. public static void main(String\[\] args) throws IOException {
    24. Scanner in = new Scanner(System.in);
    25. System.out
    26. .println("Note: This program can take a long time depending on processor.");
    27. System.out
    28. .println("It is also not necessary to run as state files are in this directory.");
    29. System.out
    30. .println("But if you would like to see how it works, you may continue.");
    31. System.out.println("Please remove state files before running.");
    32. System.out.println("\\nIs the States directory empty?");
    33. String answer = in.nextLine();
    34. if (answer.equals("N")) {
    35. System.exit(0);
    36. in.close();
    37. }
    38. System.out.println("Would you like to run the program?");
    39. String answer2 = in.nextLine();
    40. if (answer2.equals("N")) {
    41. System.exit(0);
    42. in.close();
    43. }
    44. String\[\] statesSpaced = new String\[51\];
    45. File statefile, dir, infile;
    46. // Create files for each states
    47. dir = new File("States");
    48. dir.mkdir();
    49. infile = new File("climatedata.csv");
    50. FileReader fr = new FileReader(infile);
    51. BufferedReader br = new BufferedReader(fr);
    52. String line;
    53. line = br.readLine();
    54. System.out.println();
    55. // Read in climatedata.csv
    56. final long start = System.currentTimeMillis();
    57. while ((line = br.readLine()) != null) {
    58. // Remove instances of -9999
    59. if (!line.contains("-9999")) {
    60. String stateFileName = null;
    61. Pattern p = Pattern.compile(".\* (\[A-Z\]\[A-Z\]) US");
    62. Matcher m = p.matcher(line);
    63. if (m.find()){
    64. stateFileName = m.group(1);
    65. stateFileName = "States/" \+ stateFileName + ".csv";
    66. statefile = new File(stateFileName);
    67. FileWriter stateWriter = new FileWriter(statefile, true);
    68. stateWriter.write(line + "\\n");
    69. // Progress reporting
    70. //System.out.printf("Writing \[%s\] to file \[%s\]\\n", line,
    71. // statefile);
    72. stateWriter.flush();
    73. stateWriter.close();
    74. }
    75. }
    76. }
    77. System.out.println("Elapsed " \+ (System.currentTimeMillis() - start) + " ms");
    78. br.close();
    79. fr.close();
    80. in.close();
    81. }
    82. }

     

    【回答】

    用正则表达式解析字符串很慢,而且每次只写一条记录,这也很慢,应该批量写。这里用集算器实现上面的计算过程很简单,只需几秒搞定。

    代码如下:

    A
    1=file("data.csv").import@is()
    2=A1.group(mid(~,pos(~,"US'")-2,2):state;~:data)
    3=A2.run(file("d:\\temp\\"+state+".cvs").export(data))

    集算器提供 JDBC 接口,可以像数据库一样使用,Java 如何调用 SPL 脚本

     

  • 相关阅读:
    【从入门到起飞】JavaAPI—BigInteger,BigDecimal的使用方法
    郁金香2021年游戏辅助技术中级班(一)
    1006 Sign In and Sign Out
    正向代理和反向代理快速理解
    make与makefile
    KEIL仿真和vspd
    grpc、https、oauth2等认证专栏实战9:https双向认证介绍
    2022-07-21 第四组 java之继承
    java2基础语法-运算符
    单机部署ELK + Filebeat 收集应用日志
  • 原文地址:https://blog.csdn.net/raqsoft/article/details/127995160