• Coursera Algorithm Ⅱ week5 Burrows Wheeler


    全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows

    Context

    这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
    在这里插入图片描述

    Implementation

    CircularSuffixArray

    这个CircularSuffixArray是为Burrows Wheeler做准备的, 目标就是把index[i]计算出来
    在这里插入图片描述
    注意我们只需要保存一个string的pointer就行, 不需要把所有index对应的string都创建出来, 我一开始是创建一个CircularSuffix的class, 不过由于performance没过后来直接就把CircularSuffix抛弃了

    private final Integer[] indexList;
    
    public CircularSuffixArray(String s) {
        if (s == null) throw new IllegalArgumentException("arg can not be null");
    
        indexList = new Integer[s.length()];
        for (int i = 0; i < s.length(); i++) {
          indexList[i] = i;
        }
        Comparator<Integer> cmp = (a1, a2) -> {
          for (int i = 0; i < s.length(); i++) {
            int i1 = (a1 + i) % s.length();
            int i2 = (a2 + i) % s.length();
            if (s.charAt(i1) == s.charAt(i2))
              continue;
            else
              return Character.compare(s.charAt(i1), s.charAt(i2));
          }
          return 0;
        };
    
        Arrays.sort(indexList, cmp);
      }
    

    Burrows–Wheeler transform

    Transform

    Transform就是把 t 以及最初始的string的位置3计算出来
    在这里插入图片描述
    也就是

    3
    ARD!RCAAAABB
    

    我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是

    s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)
    

    Inverse-Transform

    inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像

    1. 计算每个char出现的次数
    2. 计算每个char在next[i]

    全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows

    Context

    这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
    在这里插入图片描述

    Implementation

    CircularSuffixArray

    这个CircularSuffixArray是为Burrows Wheeler做准备的, 目标就是把index[i]计算出来
    在这里插入图片描述
    注意我们只需要保存一个string的pointer就行, 不需要把所有index对应的string都创建出来, 我一开始是创建一个CircularSuffix的class, 不过由于performance没过后来直接就把CircularSuffix抛弃了

    private final Integer[] indexList;
    
    public CircularSuffixArray(String s) {
        if (s == null) throw new IllegalArgumentException("arg can not be null");
    
        indexList = new Integer[s.length()];
        for (int i = 0; i < s.length(); i++) {
          indexList[i] = i;
        }
        Comparator<Integer> cmp = (a1, a2) -> {
          for (int i = 0; i < s.length(); i++) {
            int i1 = (a1 + i) % s.length();
            int i2 = (a2 + i) % s.length();
            if (s.charAt(i1) == s.charAt(i2))
              continue;
            else
              return Character.compare(s.charAt(i1), s.charAt(i2));
          }
          return 0;
        };
    
        Arrays.sort(indexList, cmp);
      }
    

    Burrows–Wheeler transform

    Transform

    Transform就是把 t 以及最初始的string的位置3计算出来
    在这里插入图片描述
    也就是

    3
    ARD!RCAAAABB
    

    我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是

    s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)
    

    Inverse-Transform

    inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像

    1. 计算每个char出现的次数
    2. 计算每个char在next[i]中出现的位置 couont[]
    3. 把next[i]计算出来中出现的位置 couont[]
    4. 把next[i]计算出来
      public static void inverseTransform() {
        int first = BinaryStdIn.readInt();
        String s = BinaryStdIn.readString();
        String sortedString = sortString(s);
    
        int N = s.length();
        int R = 256;
        int[] cs = new int[R + 1];
        for (int i = 0; i < N; i++) {
          cs[s.charAt(i) + 1]++;
        }
    
        for (int r = 0; r < R; r++) {
          cs[r + 1] += cs[r];
        }
    
        int[] next = new int[N];
        for (int i = 0; i < N; i++) {
          next[cs[s.charAt(i)]++] = i;
        }
    
        int p = first;
        for (int i = 0; i < N; i++) {
          BinaryStdOut.write(sortedString.charAt(p));
          p = next[p];
        }
    
    
        BinaryStdOut.close();
      }
    

    Move-to-front

    encoding

    Initialize the sequence by making the ith character in the sequence equal to the ith extended ASCII character. Now, read each 8-bit character c from standard input, one at a time; output the 8-bit index in the sequence where c appears; and move c to the front

    1. 读取8-bit character c
    2. 写下c对应的index
    3. 把c放在第一位
      public static void encode() {
        LinkedList<Character> sequence = new LinkedList<>();
        for (int i = 0; i < 256; i++) {
          sequence.add((char) i);
        }
    
        while (!BinaryStdIn.isEmpty()) {
         //读取8-bit character c
          char in = BinaryStdIn.readChar(8);
          int i = sequence.indexOf((in));
          // 写下c对应的index
          BinaryStdOut.write(i, 8);
          //把c放在第一位
          Character c = sequence.remove(i);
          sequence.addFirst(c);
        }
    
        BinaryStdOut.close();
      }
    

    decoding

    Initialize an ordered sequence of 256 characters, where extended ASCII character i appears ith in the sequence. Now, read each 8-bit character i (but treat it as an integer between 0 and 255) from standard input one at a time; write the ith character in the sequence; and move that character to the front. Check that the decoder recovers any encoded message

    1. 读取8-bit character i , 获取对应0-255对应的数字
    2. 写下来i
    3. 把i放到第一位
      public static void decode() {
        LinkedList<Character> sequence = new LinkedList<>();
        for (int i = 0; i < 256; i++) {
          sequence.add((char) i);
        }
    
        while (!BinaryStdIn.isEmpty()) {
          char in = BinaryStdIn.readChar(8);
          Character inChar = sequence.get((int) in);
          BinaryStdOut.write(inChar, 8);
          Character c = sequence.remove(in);
          sequence.addFirst(c);
        }
    
        BinaryStdOut.close();
      }
    

    总结

    这次作业还是有些难度的, 首先inverse transform需要好好想清楚, 其次performance test导致CircularSuffixArray也做了不少修改

  • 相关阅读:
    王道数据结构编程题 二叉树
    【MySQL】21-MySQL之增删改
    Spring 长事务导致connection closed,又熬了一个大夜!
    uniApp笔记
    腾讯广告RACE曝光归因模型
    12代CPU启用SR-IOV vGPU,实现一台电脑当七台用
    java File、io篇
    AVL树四种旋转的详细图解
    MCE | 曲贝替定——来自海洋的抗软组织肿瘤化合物
    机器人xacro设计+gazebo/rviz启动
  • 原文地址:https://blog.csdn.net/Joshmo/article/details/127046397