【问题】
As for example i have this data in csv file which has the column names as: “people”, “committers”, "repositoryCommitters
The “people” column has the ids from 1-5923 and i want to match the ids if they have the common repository from the “repositoryCommitters” column like for example:
- people | repositoryCommitters
- 1 | x
- 2 | x
- 3 | y
people id 1 and 2 has the common repo “x” and how do i get this ids and print in the output like:
- *Edges
- 1 2
means 1 and 2 are link because they have the common repository.
For now the code i have is:
- package network;
-
- import java.io.BufferedReader;
- import java.io.BufferedWriter;
- import java.io.File;
- import java.io.FileNotFoundException;
- import java.io.FileReader;
- import java.io.FileWriter;
- import java.io.IOException;
- import java.io.LineNumberReader;
- import java.io.PrintStream;
- import java.io.Writer;
- import java.util.ArrayList;
- import java.util.Scanner;
-
- public class Read {
- static String line;
- static BufferedReader br1 = null, br2 =null;
- static ArrayList<String> pList = new ArrayList<String>();
- static ArrayList<String> rList = new ArrayList<String>();
- static File fileName = new File("networkBuilder.txt");
-
- public static void main(String\[\] args) throws IOException
- { String fileContent = "*Vertices " ;
-
- System.out.println("Enter your current directory: ");
- Scanner scanner = new Scanner(System.in);
- String directory = scanner.nextLine();
-
- try {
- br1 = new BufferedReader(new FileReader(directory + "//people.csv"));
- br2 = new BufferedReader(new FileReader(directory + "//repo.csv"));
-
- } catch(FileNotFoundException e)
- {
- System.out.println(e.getMessage() + " \\n file not found re-run and try again");
- System.exit(0);
- }
- int count = 0;
- try {
- while((line = br1.readLine()) != null){ //skip first line
- while((line = br1.readLine()) != null)
- {
- pList.add(line); // add to array list
- count++ ;
-
- } }
-
- } catch (IOException error) {
- System.out.println(error.getMessage() + "Error reading file");
- }
- \**Vertices**\
- System.out.println("\\n"); // new line
- System.out.println(fileContent + count); //print out vertices
- //print out each item in the ArrayList
- int size = pList.size();
- for(int i=0; i < size; i++){
- String\[\] data=(pList.get(i)).split(",");
- System.out.println(data\[1\]);
-
- }
- // Save the console output in a text file
- try{
- PrintStream myconsole = new PrintStream(new File(directory + "network.txt"));
- System.setOut(myconsole);
- //print out each item in the ArrayList
- int sz = pList.size(); System.out.println(fileContent + count); //print out vertices
- for(int i=0; i < sz; i++){
- String\[\] data=(pList.get(i)).split(",");
- System.out.println(data\[1\]);
- }
- } catch(Exception er){
- }
-
- /* try{
- FileWriter fw = new FileWriter(fileName);
- Writer output = new BufferedWriter(fw);
- int size = pList.size();
- for(int j=0; j<size; j++){
-
- output.write(fileContent + count);
- ((BufferedWriter) output).newLine();
- output.write(pList.get(j) + "\\n");
- ((BufferedWriter) output).newLine();
- }
- output.close();
-
- } */
-
- /** Edges**/
- fileContent = "\\n*Edges";
- System.out.println(fileContent);
- // peopleCSV();
- // repoCSV();
-
- } // end of main
- }
And the output is:
Enter your current directory:
_C:\Users\StudentDoubts\Documents
- *Vertices 5923
- 1
- 2
- 3 . . .
【回答】
根据第二列分组,组内将第 1 列合并到同一行,硬编码实现这种算法太复杂,这种情况用集算器实现更方便,SPL 代码简单易懂:
| A | |
| 1 | =file(“people.txt”).import@t(;,"|") |
| 2 | =A1.group(repositoryCommitters).new(~.(people).concat(“ “):*Edges) |
| 3 | =file("D:/result.txt").export@t(A2) |
如果想给输出的每行加上 repositoryCommitters,只需要将 A2 改为
=A1.group(repositoryCommitters).new(~.(people).string(" "):*Edges,repositoryCommitters:repositoryCommitters)
集算器提供了 JDBC 接口,可以像数据库一样使用,Java 如何调用 SPL 脚本。