【语音识别入门】My-Voice-Analysis

简介

My-Voice-Analysis 是一个用于分析语音（同时语音、高熵）的 Python 库，打破话语并检测音节边界、基频轮廓和共振峰。其内置功能：

性别区分
语气分析
发音得分
发音率
语速
填充
f0 统计

一、安装

my-voice-analysis 可以像任何其他 Python 库一样安装，Python 包管理器 pip（最新版本）：

pip install my-voice-analysis
1

如果不成功，可以试试下面这个

pip install myprosody
1

二、示例用法

音频文件必须为*.wav格式，以44kHz采样帧和16位分辨率录制

1、性别区分函数 myspgen(p,c)

//_init_.py
def myspgend(m,p):
    sound=p+"/"+m+".wav"
    sourcerun=p+"/myspsolution.praat" 
    path=p+"/"
    try:
        objects= run_file(sourcerun, -20, 2, 0.3, "yes",sound,path, 80, 400, 0.01, capture_output=True)
        print (objects[0]) # This will print the info from the sound object, and objects[0] is a parselmouth.Sound object
        z1=str( objects[1]) # This will print the info from the textgrid object, and objects[1] is a parselmouth.Data object with a TextGrid inside
        z2=z1.strip().split()
        z3=float(z2[8]) # will be the integer number 10
        z4=float(z2[7]) # will be the floating point number 8.3
        if z4<=114:
            g=101
            j=3.4
        elif z4>114 and z4<=135:
            g=128
            j=4.35
        elif z4>135 and z4<=163:
            g=142
            j=4.85
        elif z4>163 and z4<=197:
            g=182
            j=2.7
        elif z4>197 and z4<=226:
            g=213
            j=4.5
        elif z4>226:
            g=239
            j=5.3
        else:
            print("Voice not recognized")
            exit()
        def teset(a,b,c,d):
            d1=np.random.wald(a, 1, 1000)
            d2=np.random.wald(b,1,1000)
            d3=ks_2samp(d1, d2)
            c1=np.random.normal(a,c,1000)
            c2=np.random.normal(b,d,1000)
            c3=ttest_ind(c1,c2)
            y=([d3[0],d3[1],abs(c3[0]),c3[1]])
            return y
        nn=0
        mm=teset(g,j,z4,z3)
        while (mm[3]>0.05 and mm[0]>0.04 or nn<5):
            mm=teset(g,j,z4,z3)
            nn=nn+1
        nnn=nn
        if mm[3]<=0.09:
            mmm=mm[3]
        else:
            mmm=0.35
        if z4>97 and z4<=114:
            print("a Male, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
        elif z4>114 and z4<=135:
            print("a Male, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
        elif z4>135 and z4<=163:
            print("a Male, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
        elif z4>163 and z4<=197:
            print("a female, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
        elif z4>197 and z4<=226:
            print("a female, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
        elif z4>226 and z4<=245:
            print("a female, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
        else:
            print("Voice not recognized")
    except:
        print ("Try again the sound of the audio was not clear")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

                [in]  import myspsolution as mysp
                     
                     p="Walkers" # Audio File title
                     c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                     mysp.myspgend(p,c)
                
                [out] a female, mood of speech: Reading, p-value/sample size= :0.00 5
1
2
3
4
5
6
7

2、发音后验概率得分百分比：函数 mysppron(p,c)

                [in]   import myspsolution as mysp

                       p="Walkers" # Audio File title
                       c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                       mysp.mysppron(p,c)
                       
               [out]   Pronunciation_posteriori_probability_score_percentage= :85.00
1
2
3
4
5
6
7

3、检测并计算音节数：函数 myspsyl(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspsyl(p,c)
                        
                [out]   number_ of_syllables= 154
1
2
3
4
5
6
7

4、检测和计算填充和暂停的数量：函数 mysppaus(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.mysppaus(p,c)
                        
                [out]   number_of_pauses= 22
1
2
3
4
5
6
7

5、测量语速（速度）：函数 myspsr(p,c)

                [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspsr(p,c)
                
                [out]   rate_of_speech= 3 # syllables/sec original duration
1
2
3
4
5
6
7

6、测量发音（速度）：函数 myspatc(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspatc(p,c)
                        
                [out]  articulation_rate= 5 # syllables/sec speaking duration
1
2
3
4
5
6
7

7、测量说话时间（不包括填充词和暂停）：函数 myspst(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspst(p,c)
           
                [out]   speaking_duration= 31.6 # sec only speaking duration without pauses
1
2
3
4
5
6
7

8、测量总说话时长（包括填充词和停顿）：函数 myspod(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspod(p,c)
                        
                [out]   original_duration= 49.2 # sec total speaking duration with pauses
1
2
3
4
5
6
7

9、测量说话时长与总说话时长之间的比率：函数 myspbala(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspbala(p,c)

                [out]   balance= 0.6 # ratio (speaking duration)/(original duration)
1
2
3
4
5
6
7

10、测量基频分布均值：函数 myspf0mean(p,c)

                 [in]   import myspsolution as mysp

                        p="Walkers" # Audio File title
                        c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
                        mysp.myspf0mean(p,c)

                 [out]  f0_mean= 212.45 # Hz global mean of fundamental frequency distribution
1
2
3
4
5
6
7

三、发展

My-Voice-Analysis 由日本的 Sab-AI Lab（以前称为 Mysolution）开发。它是 Sab-AI 实验室开发语言学声学模型项目的一部分。计划通过添加更高级的功能以及添加语言模型来丰富 My-Voice Analysis 的功能。

请参阅 Myprosody https://github.com/Shahabks/myprosody和 Speech- Rater https://shahabks.github.io/Speech-Rater/）
1

相关阅读:
RAG综述《Retrieval-Augmented Generation for Large Language Models: A Survey》笔记
 POSTGRESQL 一个“大” SQL 的优化历险记
 【C++】封装unordered_map和unordered_set（用哈希桶实现）
数据分析思维（一）｜信度与效度思维
 当 SQL DELETE 邂逅 Table aliases，会擦出怎样的火花
 JSONP解决跨域问题
 Go-zero从入门到精通（一）
多元回归分析 | RF随机森林多输入单输出预测（Matlab完整程序）
多因素蚁群算法的移动机器人路径规划研究附Matlab代码
 自研系统加入license授权（附源码）
原文地址：https://blog.csdn.net/weixin_51293984/article/details/126695297