去除重复数据

【问题】

I am looking for some help. I have an application at work that generates a csv with user information on it. I want to use Java and take the data, delete duplicate information, rearrange it, and create a spreadsheet, to make life easier. The csv is generated in the following format, but much larger:


21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555
21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net
21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555
99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net
99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, cell phone, 444-444-4444

I want to delete the duplicates and arrange the data in appropriate columns.

ID | PIN | Lname | Fname | Dept | team | work px | work email

I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table.

This is the code I have so far


public class Sort {
   public static void main(String[] args) {
     BufferedReader br = null;
     try{
            String line="";
            String csvSplitBy=(",");
            String outPut;
 
            br = new BufferedReader(new FileReader("C:/Users/Jason/Desktop/test.txt"));    //location where the file is retreived 
 
            while ((line = br.readLine()) !=null){  //checks to see if the data is there
                   String[] id = line.split(csvSplitBy);
 
                   outPut = id[0] + "," + id[1] + "," + id[2] + "," + id[3] + "," + id[4] + "," + id[5] + "," + id[6] + "," + id[7]
                           + "," + id[8] + "," + id[9];//incomplete...using for test...
 
                 System.out.println(outPut);   //displays the contents of the .txt file
                   } //ends while statement 
             } //ends try
 
     catch (IOException e){
            System.out.println ("File not found!");
         } //ends catch
         finally{
                try{
                       if (br !=null)br.close();}
                catch(IOException ex){
                       ex.printStackTrace();
                } //ends try
         } //ends finally
   } //ends main method
} //ends class Sort

【回答】

JAVA没有直接实现文本文件分组或求唯一值的类库，自行编码会非常复杂。这种问题建议采用SPL来协助JAVA完成，代码很简单：

	A
1	=file("D:\\dup.csv").import@c()
2	=A1.group(_1,_2,_3,_4,_5,_6;~.select@1(_7=="work phone")._8,~.select@1(_7=="work email")._8)
3	=file("D:\\result.csv").export@c(A2)

A1：读取文件dup.csv中的内容。

A2：去除重复数据，并选出需要的数据。

A3：将A2导出到文件result.csv中。

SPL脚本可以嵌入JAVA中使用（参考Java 如何调用 SPL 脚本）

相关阅读:
el-upload上传附件限制大小
Seata详解
基于gunicorn+flask+docker模型高并发部署
2023年7月京东笔记本电脑行业品牌销售排行榜（京东数据平台）
微服务10-Sentinel中的隔离和降级
Leetcode（665）——非递减数列
EasyCVR视频广场切换通道，视频播放协议异常的问题修复
git常用命令以及常见错误处理
docker部署mysql5.7
golang 透明底图转白底

原文地址：https://blog.csdn.net/raqsoft/article/details/126864180