• 解析标准树状文件为sql语句及代码解析(python版本,txt,mysql)


    如需C++版本,请跳转:解析标准树状文件为sql语句及代码解析(c++版本,txt,mysql)

    使用场景

    解析标准化的树状文件为数据库tree表insert语句,涉及:

    • 自动编号指标id(NodeCode);
    • 获取节点间父子关系(ParentNode);
    • 所属层级(LevelNumber);
    • 是否为叶节点(NoteCategory:1中间节点,2叶子结点);
    • 设置同级节点间顺序(BrotherNo)

    目标效果

    输入:树状文件片段(txt为例)
    /clarksons/tree.txt

    ----Oil Tankers
    ------Trade & Demand
    --------Monthly Global Seaborne Trade Indicators (Monthly)
    ----------Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr 
    ----------Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr 3mma
    ----------Monthly Global Seaborne Crude Oil Trade Indicator Volume Index
    ----------Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr 
    ----------Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr 3mma 
    ----------Monthly Global Seaborne Oil Products Trade Indicator Volume Index 
    ----------Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr 
    ----------Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr 3mma 
    ----------Monthly Global Seaborne Total Oil Trade Indicator Volume Index 
    ------Port Call Activity
    --------Country
    ----------Russia Port Calls - Product Tankers 25,000+ dwt (Beta) 
    ----------Russia Port Calls - Product Tankers 25,000+ dwt (Beta) DWT million
    ----------Russia Port Calls - Crude Tankers (Beta) 
    ----------Russia Port Calls - Crude Tankers (Beta) DWT million 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18

    输出:自动生成的入库语句(mysql语句为例)
    /clarksons/tree_sql.txt

    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701001, 701001, 'Oil Tankers', 'Oil Tankers', 'Oil Tankers', 701, 1, 3, 2, 1, now(), 1655703762299, 3502, NULL);
    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701001 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701002, 701002, 'Trade & Demand', 'Trade & Demand', 'Trade & Demand', 701001, 1, 4, 2, 1, now(), 1655703762305, 3502, NULL);
    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701002 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701003, 701003, 'Monthly Global Seaborne Trade Indicators (Monthly)', 'Monthly Global Seaborne Trade Indicators (Monthly)', 'Monthly Global Seaborne Trade Indicators (Monthly', 701002, 1, 5, 2, 1, now(), 1655703762312, 3502, NULL);
    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701003 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701004, 701004, 'Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Crude Oil Trade Indicator', 701003, 1, 6, 2, 1, now(), 1655703762321, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701005, 701005, 'Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr 3mma', 'Monthly Global Seaborne Crude Oil Trade Indicator % Yr/Yr 3mma', 'Monthly Global Seaborne Crude Oil Trade Indicator', 701003, 2, 6, 2, 1, now(), 1655703762329, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701006, 701006, 'Monthly Global Seaborne Crude Oil Trade Indicator Volume Index', 'Monthly Global Seaborne Crude Oil Trade Indicator Volume Index', 'Monthly Global Seaborne Crude Oil Trade Indicator', 701003, 3, 6, 2, 1, now(), 1655703762336, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701007, 701007, 'Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Oil Products Trade Indica', 701003, 4, 6, 2, 1, now(), 1655703762346, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701008, 701008, 'Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr 3mma ', 'Monthly Global Seaborne Oil Products Trade Indicator % Yr/Yr 3mma ', 'Monthly Global Seaborne Oil Products Trade Indica', 701003, 5, 6, 2, 1, now(), 1655703762356, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701009, 701009, 'Monthly Global Seaborne Oil Products Trade Indicator Volume Index ', 'Monthly Global Seaborne Oil Products Trade Indicator Volume Index ', 'Monthly Global Seaborne Oil Products Trade Indica', 701003, 6, 6, 2, 1, now(), 1655703762369, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701010, 701010, 'Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr ', 'Monthly Global Seaborne Total Oil Trade Indicator', 701003, 7, 6, 2, 1, now(), 1655703762378, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701011, 701011, 'Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr 3mma ', 'Monthly Global Seaborne Total Oil Trade Indicator % Yr/Yr 3mma ', 'Monthly Global Seaborne Total Oil Trade Indicator', 701003, 8, 6, 2, 1, now(), 1655703762385, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701012, 701012, 'Monthly Global Seaborne Total Oil Trade Indicator Volume Index ', 'Monthly Global Seaborne Total Oil Trade Indicator Volume Index ', 'Monthly Global Seaborne Total Oil Trade Indicator', 701003, 9, 6, 2, 1, now(), 1655703762395, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701013, 701013, 'Port Call Activity', 'Port Call Activity', 'Port Call Activity', 701001, 2, 4, 2, 1, now(), 1655703762407, 3502, NULL);
    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701013 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701014, 701014, 'Country', 'Country', 'Country', 701013, 1, 5, 2, 1, now(), 1655703762417, 3502, NULL);
    UPDATE c_in_indicatortree set NoteCategory = 1 where NodeCode = 701014 ;
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701015, 701015, 'Russia Port Calls - Product Tankers 25,000+ dwt (Beta) ', 'Russia Port Calls - Product Tankers 25,000+ dwt (Beta) ', 'Russia Port Calls - Product Tankers 25,000+ dwt (', 701014, 1, 6, 2, 1, now(), 1655703762426, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701016, 701016, 'Russia Port Calls - Product Tankers 25,000+ dwt (Beta) DWT million', 'Russia Port Calls - Product Tankers 25,000+ dwt (Beta) DWT million', 'Russia Port Calls - Product Tankers 25,000+ dwt (', 701014, 2, 6, 2, 1, now(), 1655703762436, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701017, 701017, 'Russia Port Calls - Crude Tankers (Beta) ', 'Russia Port Calls - Crude Tankers (Beta) ', 'Russia Port Calls - Crude Tankers (Beta) ', 701014, 3, 6, 2, 1, now(), 1655703762449, 3502, NULL);
    INSERT INTO `caldb`.`c_in_indicatortree` (`ID`, `NodeCode`, `NodeName`, `NodeAbbr`, `ChiSpelling`, `ParentNode`, `BrotherNo`, `LevelNumber`, `NoteCategory`, `NodeState`, `UpdateTime`, `JSID`, `Edition`, `IndiRemark`)VALUES (701018, 701018, 'Russia Port Calls - Crude Tankers (Beta) DWT million ', 'Russia Port Calls - Crude Tankers (Beta) DWT million ', 'Russia Port Calls - Crude Tankers (Beta) DWT mill', 701014, 4, 6, 2, 1, now(), 1655703762457, 3502, NULL);
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    代码解析(c++)

    import time
    
    
    def get_first_not_of_(data):
        if data.startswith("-"):
            num = 0
            for char in data:
                if char == '-':
                    num += 1
                else:
                    break
            return num
        else:
            return -1
    
    
    def gen_tree(source, table_name):
        file_read = open("./" + source + "/tree.txt", "r", encoding="UTF-8")
        file_write = open("./" + source + "/tree_sql.txt", "w", encoding="UTF-8")
    
        # parent_indicatorId 记录在n级别上的父指标id
        parent_indicator_id = [-1, 150000000, 801, 0, 0, 0, 0, 0]
    
        # brotherNo 记录在n级别上存在的兄弟节点个数
        brother_no = [-1, 1, 0, 0, 0, 0, 0, 0]
    
        # current parent
        parent_level = 2
    
        # start from first node
        cur_indicator_id = 801001
    
        # cursor
        last_indicator_id = parent_indicator_id[parent_level]
        last_level_cursor = parent_level
    
        # 遍历
        read_lines = file_read.readlines()
    
        for line in read_lines:
    
            if line[-1] == '\n':
                line = line[:-1]
    
            _num = get_first_not_of_(line)
            cur_level = _num // 2 + 1
    
            print(line)
            note_category = 2
    
            # peer, son, not son
            if cur_level == last_level_cursor:
                brother_no[cur_level] += 1
                cur_brother_no = brother_no[cur_level]
            elif last_level_cursor < cur_level:
                brother_no[cur_level] = 1
                cur_brother_no = brother_no[cur_level]
                update_sql = "UPDATE {} set NoteCategory = 1 where NodeCode = {};".format(table_name, last_indicator_id)
                file_write.write(update_sql+'\n')
            else:
                brother_no[last_level_cursor] = 0
                brother_no[cur_level] += 1
                cur_brother_no = brother_no[cur_level]
    
            # 从一维数组获取指标父节点,并更新当前级别父指标为当前指标
            cur_parent_indicator_id = parent_indicator_id[cur_level - 1]
            parent_indicator_id[cur_level] = cur_indicator_id
    
            # 指标名称截取
            cur_indicator_name = line[_num:]
    
            # 指标简称、拼音简称等生成
            node_abbr = cur_indicator_name[0:99]
            chi_spelling = cur_indicator_name[0:49]
    
            timestemp = time.time_ns() // 1000000
    
            insert_sql = """INSERT INTO {} VALUES ({}, {}, '{}', '{}', '{}', {}, {}, {}, {}, 1
            , now(), {}, 3502, NULL);"""\
                .format(table_name, cur_indicator_id, cur_indicator_id, cur_indicator_name, node_abbr, chi_spelling
                        , cur_parent_indicator_id, cur_brother_no, cur_level, note_category, timestemp)
            file_write.write(insert_sql+'\n')
    
            last_indicator_id = cur_indicator_id
            last_level_cursor = cur_level
            cur_indicator_id += 1
            time.sleep(0.005)
    
        file_read.close()
        file_write.close()
    
    
    if __name__ == '__main__':
        gen_tree("clarksons", "indicator_tree")
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
  • 相关阅读:
    312.戳气球
    【每日一题】1993. 树上的操作
    【需水预测】区域需水量预测方法-定额法
    Netty 学习(四):ChannelHandler 的事件传播和生命周期
    进行LDPC编码时扔掉了前面2Zc的信息位,为何如此设计?在解码时如何恢复?
    05-SA8155 QNX SPI 全双工通讯
    关于数据迁移:解决kettle中mysql的数据是tinyint的kettle查出来后变成boolean问题
    音乐伴奏提取?唱歌剪辑好用的音频人声分离软件!提取步骤很简单!
    T246836 [LSOT-1] 暴龙的土豆
    【Vue】ElementUI实现登录注册
  • 原文地址:https://blog.csdn.net/weixin_45564920/article/details/125456196