分组后合并记录中的字段值

【问题】

As for example i have this data in csv file which has the column names as: “people”, “committers”, "repositoryCommitters

The “people” column has the ids from 1-5923 and i want to match the ids if they have the common repository from the “repositoryCommitters” column like for example:


people | repositoryCommitters
1 | x
2 | x
3 | y

people id 1 and 2 has the common repo “x” and how do i get this ids and print in the output like:


*Edges
1 2

means 1 and 2 are link because they have the common repository.

For now the code i have is:


package network;
 
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Scanner;
 
public class Read {
 static String line;
 static BufferedReader br1 = null, br2 =null;
 static ArrayList<String> pList = new ArrayList<String>();
 static ArrayList<String> rList = new ArrayList<String>();
 static File fileName = new File("networkBuilder.txt");
 
 public static void main(String\[\] args) throws IOException
 { String fileContent = "*Vertices " ;
 
System.out.println("Enter your current directory: ");
 Scanner scanner = new Scanner(System.in);
 String directory = scanner.nextLine();
 
try {
 br1 = new BufferedReader(new FileReader(directory + "//people.csv"));
 br2 = new BufferedReader(new FileReader(directory + "//repo.csv"));
 
} catch(FileNotFoundException e)
 {
System.out.println(e.getMessage() + " \\n file not found re-run and try again");
 System.exit(0);
 }
 int count = 0;
 try {
 while((line = br1.readLine()) != null){ //skip first line
 while((line = br1.readLine()) != null)
 {
 pList.add(line); // add to array list
 count++ ;
 
 } }
 
} catch (IOException error) {
 System.out.println(error.getMessage() + "Error reading file");
 }
 \**Vertices**\ 
System.out.println("\\n"); // new line
 System.out.println(fileContent + count); //print out vertices
 //print out each item in the ArrayList
 int size = pList.size();
 for(int i=0; i < size; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);
 
} 
// Save the console output in a text file
 try{
 PrintStream myconsole = new PrintStream(new File(directory + "network.txt"));
 System.setOut(myconsole);
 //print out each item in the ArrayList
int sz = pList.size(); System.out.println(fileContent + count); //print out vertices
 for(int i=0; i < sz; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);
 }
 } catch(Exception er){
 }
 
 /* try{
 FileWriter fw = new FileWriter(fileName);
 Writer output = new BufferedWriter(fw);
 int size = pList.size();
 for(int j=0; j<size; j++){
 
 output.write(fileContent + count);
 ((BufferedWriter) output).newLine();
 output.write(pList.get(j) + "\\n");
 ((BufferedWriter) output).newLine();
 }
output.close(); 
 
 } */
 
 /** Edges**/
 fileContent = "\\n*Edges";
 System.out.println(fileContent);
 // peopleCSV();
 // repoCSV();
 
 } // end of main
}

And the output is:

Enter your current directory:

_C:\Users\StudentDoubts\Documents


*Vertices 5923
1
2
3 . . .

【回答】

根据第二列分组，组内将第 1 列合并到同一行，硬编码实现这种算法太复杂，这种情况用集算器实现更方便，SPL 代码简单易懂：

	A
1	=file(“people.txt”).import@t(;,"\|")
2	=A1.group(repositoryCommitters).new(~.(people).concat(“ “):*Edges)
3	=file("D:/result.txt").export@t(A2)

如果想给输出的每行加上 repositoryCommitters，只需要将 A2 改为

=A1.group(repositoryCommitters).new(~.(people).string(" "):*Edges,repositoryCommitters:repositoryCommitters)

集算器提供了 JDBC 接口，可以像数据库一样使用，Java 如何调用 SPL 脚本。

相关阅读:
NXP官方uboot针对ALPHA开发板网络驱动更改网口
SpringMVC：转发和重定向
[李宏毅深度学习作业] 作业1：ML2021Spring-hw1 COVID-19 Cases Prediction【以时间线为记录】
Linux最常用命令用法总结(精选)
`英语` 2022/8/27
JDBC003--java中执行update修改数据表操作
SPASS-回归分析
vue（十二）——vue3新特性之Teleport
Ribbon
angular中多层嵌套结构的表单如何处理回显问题

原文地址：https://blog.csdn.net/raqsoft/article/details/128012356