时长比较短的音频:https://huggingface.co/datasets/PolyAI/minds14/viewer/en-US
时长比较长的音频:https://huggingface.co/datasets/librispeech_asr?row=8
此次测试过程暂时只使用比较短的音频
下载安装,参考官方网站即可
报错提示:
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!
解决办法:
找到有libcudnn_ops_infer.so.8 的路径,在我的电脑中,改文件所在的路径为
在终端导入 export LD_LIBRARY_PATH=/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
test_fast_whisper.py
-
- import subprocess
- import os
- import time
- import unittest
- import openpyxl
- from pydub import AudioSegment
- from datasets import load_dataset
-
- from faster_whisper import WhisperModel
-
- class TestFastWhisper(unittest.TestCase):
-
- def setUp(self):
- pass
- def test_fastwhisper(self):
- # 替换为您的脚本路径
-
- # 设置HTTP代理
- os.environ["http_proxy"] = "http://10.10.10.178:7890"
- os.environ["HTTP_PROXY"] = "http://10.10.10.178:7890"
- # 不知道此处为什么不能生效,必须要在终端中手动导入
- os.environ["LD_LIBRARY_PATH"] = "/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH"
-
- # 设置HTTPS代理
- os.environ["https_proxy"] = "http://10.10.10.178:7890"
- os.environ["HTTPS_PROXY"] = "http://10.10.10.178:7890"
- print("load whisper")
- # 使用fast_whisper
- model_size = "large-v2"
-
- # Run on GPU with FP16
- fast_whisper_model = WhisperModel(model_size, device="cuda", compute_type="float16")
- minds_14 = load_dataset("PolyAI/minds14", "en-US", split="train") # for en-US
-
- workbook = openpyxl.Workbook()
- # 创建一个工作表
- worksheet = workbook.active
- # 设置表头
- worksheet["A1"] = "Audio Path"
- worksheet["B1"] = "Audio Duration (seconds)"
- worksheet["C1"] = "Audio Size (MB)"
- worksheet["D1"] = "Correct Text"
- worksheet["E1"] = "Transcribed Text"
- worksheet["F1"] = "Cost Time (seconds)"
- for index, each in enumerate(minds_14, start=2):
- audioPath = each["path"]
- print(audioPath)
- # audioArray = each["audio"]
- audioDuration = len(AudioSegment.from_file(audioPath))/1000
- audioSize = os.path.getsize(audioPath)/ (1024 * 1024)
- CorrectText = each["transcription"]
- tran_start_time = time.time()
- segments, info = fast_whisper_model.transcribe(audioPath, beam_size=5)
- segments = list(segments) # The transcription will actually run here.
- print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
- text = ""
- for segment in segments:
- text += segment.text
- cost_time = time.time() - tran_start_time
- print("Audio Path:", audioPath)
- print("Audio Duration (seconds):", audioDuration)
- print("Audio Size (MB):", audioSize)
- print("Correct Text:", CorrectText)
- print("Transcription Time (seconds):", cost_time)
- print("Transcribed Text:", text)
-
- worksheet[f"A{index}"] = audioPath
- worksheet[f"B{index}"] = audioDuration
- worksheet[f"C{index}"] = audioSize
- worksheet[f"D{index}"] = CorrectText
- worksheet[f"E{index}"] = text
- worksheet[f"F{index}"] = cost_time
- # break
- workbook.save("fast_whisper_output_data.xlsx")
- print("数据已保存到 fast_whisper_output_data.xlsx 文件")
-
-
- if __name__ == '__main__':
- unittest.main()
下载安装,参考官方网站即可,代码与上面代码类似
不太熟悉用numbers,凑合着看一下就行
很明显,fast_whisper速度要更快一些