全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows
这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
这个CircularSuffixArray是为Burrows Wheeler做准备的, 目标就是把index[i]计算出来
注意我们只需要保存一个string的pointer就行, 不需要把所有index对应的string都创建出来, 我一开始是创建一个CircularSuffix的class, 不过由于performance没过后来直接就把CircularSuffix抛弃了
private final Integer[] indexList;
public CircularSuffixArray(String s) {
if (s == null) throw new IllegalArgumentException("arg can not be null");
indexList = new Integer[s.length()];
for (int i = 0; i < s.length(); i++) {
indexList[i] = i;
}
Comparator<Integer> cmp = (a1, a2) -> {
for (int i = 0; i < s.length(); i++) {
int i1 = (a1 + i) % s.length();
int i2 = (a2 + i) % s.length();
if (s.charAt(i1) == s.charAt(i2))
continue;
else
return Character.compare(s.charAt(i1), s.charAt(i2));
}
return 0;
};
Arrays.sort(indexList, cmp);
}
Transform就是把 t 以及最初始的string的位置3计算出来
也就是
3
ARD!RCAAAABB
我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是
s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)
inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像
全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows
这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
这个CircularSuffixArray是为Burrows Wheeler做准备的, 目标就是把index[i]计算出来
注意我们只需要保存一个string的pointer就行, 不需要把所有index对应的string都创建出来, 我一开始是创建一个CircularSuffix的class, 不过由于performance没过后来直接就把CircularSuffix抛弃了
private final Integer[] indexList;
public CircularSuffixArray(String s) {
if (s == null) throw new IllegalArgumentException("arg can not be null");
indexList = new Integer[s.length()];
for (int i = 0; i < s.length(); i++) {
indexList[i] = i;
}
Comparator<Integer> cmp = (a1, a2) -> {
for (int i = 0; i < s.length(); i++) {
int i1 = (a1 + i) % s.length();
int i2 = (a2 + i) % s.length();
if (s.charAt(i1) == s.charAt(i2))
continue;
else
return Character.compare(s.charAt(i1), s.charAt(i2));
}
return 0;
};
Arrays.sort(indexList, cmp);
}
Transform就是把 t 以及最初始的string的位置3计算出来
也就是
3
ARD!RCAAAABB
我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是
s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)
inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像
public static void inverseTransform() {
int first = BinaryStdIn.readInt();
String s = BinaryStdIn.readString();
String sortedString = sortString(s);
int N = s.length();
int R = 256;
int[] cs = new int[R + 1];
for (int i = 0; i < N; i++) {
cs[s.charAt(i) + 1]++;
}
for (int r = 0; r < R; r++) {
cs[r + 1] += cs[r];
}
int[] next = new int[N];
for (int i = 0; i < N; i++) {
next[cs[s.charAt(i)]++] = i;
}
int p = first;
for (int i = 0; i < N; i++) {
BinaryStdOut.write(sortedString.charAt(p));
p = next[p];
}
BinaryStdOut.close();
}
Initialize the sequence by making the ith character in the sequence equal to the ith extended ASCII character. Now, read each 8-bit character c from standard input, one at a time; output the 8-bit index in the sequence where c appears; and move c to the front
public static void encode() {
LinkedList<Character> sequence = new LinkedList<>();
for (int i = 0; i < 256; i++) {
sequence.add((char) i);
}
while (!BinaryStdIn.isEmpty()) {
//读取8-bit character c
char in = BinaryStdIn.readChar(8);
int i = sequence.indexOf((in));
// 写下c对应的index
BinaryStdOut.write(i, 8);
//把c放在第一位
Character c = sequence.remove(i);
sequence.addFirst(c);
}
BinaryStdOut.close();
}
Initialize an ordered sequence of 256 characters, where extended ASCII character i appears ith in the sequence. Now, read each 8-bit character i (but treat it as an integer between 0 and 255) from standard input one at a time; write the ith character in the sequence; and move that character to the front. Check that the decoder recovers any encoded message
public static void decode() {
LinkedList<Character> sequence = new LinkedList<>();
for (int i = 0; i < 256; i++) {
sequence.add((char) i);
}
while (!BinaryStdIn.isEmpty()) {
char in = BinaryStdIn.readChar(8);
Character inChar = sequence.get((int) in);
BinaryStdOut.write(inChar, 8);
Character c = sequence.remove(in);
sequence.addFirst(c);
}
BinaryStdOut.close();
}
这次作业还是有些难度的, 首先inverse transform需要好好想清楚, 其次performance test导致CircularSuffixArray也做了不少修改