• 【语音识别入门】My-Voice-Analysis


    简介

    My-Voice-Analysis 是一个用于分析语音(同时语音、高熵)的 Python 库,打破话语并检测音节边界、基频轮廓和共振峰。其内置功能:

    • 性别区分
    • 语气分析
    • 发音得分
    • 发音率
    • 语速
    • 填充
    • f0 统计

    一、安装

    my-voice-analysis 可以像任何其他 Python 库一样安装,Python 包管理器 pip(最新版本):

    pip install my-voice-analysis
    
    • 1

    如果不成功,可以试试下面这个

    pip install myprosody
    
    • 1

    二、示例用法

    音频文件必须为*.wav格式,以44kHz采样帧和16位分辨率录制

    1、性别区分函数 myspgen(p,c)
    //_init_.py
    def myspgend(m,p):
        sound=p+"/"+m+".wav"
        sourcerun=p+"/myspsolution.praat" 
        path=p+"/"
        try:
            objects= run_file(sourcerun, -20, 2, 0.3, "yes",sound,path, 80, 400, 0.01, capture_output=True)
            print (objects[0]) # This will print the info from the sound object, and objects[0] is a parselmouth.Sound object
            z1=str( objects[1]) # This will print the info from the textgrid object, and objects[1] is a parselmouth.Data object with a TextGrid inside
            z2=z1.strip().split()
            z3=float(z2[8]) # will be the integer number 10
            z4=float(z2[7]) # will be the floating point number 8.3
            if z4<=114:
                g=101
                j=3.4
            elif z4>114 and z4<=135:
                g=128
                j=4.35
            elif z4>135 and z4<=163:
                g=142
                j=4.85
            elif z4>163 and z4<=197:
                g=182
                j=2.7
            elif z4>197 and z4<=226:
                g=213
                j=4.5
            elif z4>226:
                g=239
                j=5.3
            else:
                print("Voice not recognized")
                exit()
            def teset(a,b,c,d):
                d1=np.random.wald(a, 1, 1000)
                d2=np.random.wald(b,1,1000)
                d3=ks_2samp(d1, d2)
                c1=np.random.normal(a,c,1000)
                c2=np.random.normal(b,d,1000)
                c3=ttest_ind(c1,c2)
                y=([d3[0],d3[1],abs(c3[0]),c3[1]])
                return y
            nn=0
            mm=teset(g,j,z4,z3)
            while (mm[3]>0.05 and mm[0]>0.04 or nn<5):
                mm=teset(g,j,z4,z3)
                nn=nn+1
            nnn=nn
            if mm[3]<=0.09:
                mmm=mm[3]
            else:
                mmm=0.35
            if z4>97 and z4<=114:
                print("a Male, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
            elif z4>114 and z4<=135:
                print("a Male, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
            elif z4>135 and z4<=163:
                print("a Male, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
            elif z4>163 and z4<=197:
                print("a female, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
            elif z4>197 and z4<=226:
                print("a female, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
            elif z4>226 and z4<=245:
                print("a female, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
            else:
                print("Voice not recognized")
        except:
            print ("Try again the sound of the audio was not clear")
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
                    [in]  import myspsolution as mysp
                         
                         p="Walkers" # Audio File title
                         c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                         mysp.myspgend(p,c)
                    
                    [out] a female, mood of speech: Reading, p-value/sample size= :0.00 5
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    2、发音后验概率得分百分比:函数 mysppron(p,c)
                    [in]   import myspsolution as mysp
    
                           p="Walkers" # Audio File title
                           c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                           mysp.mysppron(p,c)
                           
                   [out]   Pronunciation_posteriori_probability_score_percentage= :85.00
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    3、检测并计算音节数:函数 myspsyl(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspsyl(p,c)
                            
                    [out]   number_ of_syllables= 154
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    4、检测和计算填充和暂停的数量:函数 mysppaus(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.mysppaus(p,c)
                            
                    [out]   number_of_pauses= 22
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    5、测量语速(速度):函数 myspsr(p,c)
                    [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspsr(p,c)
                    
                    [out]   rate_of_speech= 3 # syllables/sec original duration
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    6、测量发音(速度):函数 myspatc(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspatc(p,c)
                            
                    [out]  articulation_rate= 5 # syllables/sec speaking duration
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    7、测量说话时间(不包括填充词和暂停):函数 myspst(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspst(p,c)
               
                    [out]   speaking_duration= 31.6 # sec only speaking duration without pauses
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    8、测量总说话时长(包括填充词和停顿):函数 myspod(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspod(p,c)
                            
                    [out]   original_duration= 49.2 # sec total speaking duration with pauses
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    9、测量说话时长与总说话时长之间的比率:函数 myspbala(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspbala(p,c)
    
                    [out]   balance= 0.6 # ratio (speaking duration)/(original duration)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    10、测量基频分布均值:函数 myspf0mean(p,c)
                     [in]   import myspsolution as mysp
    
                            p="Walkers" # Audio File title
                            c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                            mysp.myspf0mean(p,c)
    
                     [out]  f0_mean= 212.45 # Hz global mean of fundamental frequency distribution
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    三、发展

    My-Voice-Analysis 由日本的 Sab-AI Lab(以前称为 Mysolution)开发。它是 Sab-AI 实验室开发语言学声学模型项目的一部分。计划通过添加更高级的功能以及添加语言模型来丰富 My-Voice Analysis 的功能。

    请参阅 Myprosody https://github.com/Shahabks/myprosody和 Speech- Rater https://shahabks.github.io/Speech-Rater/)
    
    • 1
  • 相关阅读:
    RAG综述 《Retrieval-Augmented Generation for Large Language Models: A Survey》笔记
    POSTGRESQL 一个“大” SQL 的优化历险记
    【C++】封装unordered_map和unordered_set(用哈希桶实现)
    数据分析思维(一)|信度与效度思维
    当 SQL DELETE 邂逅 Table aliases,会擦出怎样的火花
    JSONP解决跨域问题
    Go-zero从入门到精通(一)
    多元回归分析 | RF随机森林多输入单输出预测(Matlab完整程序)
    多因素蚁群算法的移动机器人路径规划研究附Matlab代码
    自研系统加入license授权(附源码)
  • 原文地址:https://blog.csdn.net/weixin_51293984/article/details/126695297