• 从Hugging Face下载数据测试whisper、fast_whisper耗时


    时长比较短的音频:https://huggingface.co/datasets/PolyAI/minds14/viewer/en-US

    时长比较长的音频:https://huggingface.co/datasets/librispeech_asr?row=8

    此次测试过程暂时只使用比较短的音频

    使用fast_whisper测试

    下载安装,参考官方网站即可

     报错提示:

    Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
    Please make sure libcudnn_ops_infer.so.8 is in your library path!

    解决办法:

    找到有libcudnn_ops_infer.so.8 的路径,在我的电脑中,改文件所在的路径为

    在终端导入  export LD_LIBRARY_PATH=/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH

    test_fast_whisper.py

    1. import subprocess
    2. import os
    3. import time
    4. import unittest
    5. import openpyxl
    6. from pydub import AudioSegment
    7. from datasets import load_dataset
    8. from faster_whisper import WhisperModel
    9. class TestFastWhisper(unittest.TestCase):
    10. def setUp(self):
    11. pass
    12. def test_fastwhisper(self):
    13. # 替换为您的脚本路径
    14. # 设置HTTP代理
    15. os.environ["http_proxy"] = "http://10.10.10.178:7890"
    16. os.environ["HTTP_PROXY"] = "http://10.10.10.178:7890"
    17. # 不知道此处为什么不能生效,必须要在终端中手动导入
    18. os.environ["LD_LIBRARY_PATH"] = "/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH"
    19. # 设置HTTPS代理
    20. os.environ["https_proxy"] = "http://10.10.10.178:7890"
    21. os.environ["HTTPS_PROXY"] = "http://10.10.10.178:7890"
    22. print("load whisper")
    23. # 使用fast_whisper
    24. model_size = "large-v2"
    25. # Run on GPU with FP16
    26. fast_whisper_model = WhisperModel(model_size, device="cuda", compute_type="float16")
    27. minds_14 = load_dataset("PolyAI/minds14", "en-US", split="train") # for en-US
    28. workbook = openpyxl.Workbook()
    29. # 创建一个工作表
    30. worksheet = workbook.active
    31. # 设置表头
    32. worksheet["A1"] = "Audio Path"
    33. worksheet["B1"] = "Audio Duration (seconds)"
    34. worksheet["C1"] = "Audio Size (MB)"
    35. worksheet["D1"] = "Correct Text"
    36. worksheet["E1"] = "Transcribed Text"
    37. worksheet["F1"] = "Cost Time (seconds)"
    38. for index, each in enumerate(minds_14, start=2):
    39. audioPath = each["path"]
    40. print(audioPath)
    41. # audioArray = each["audio"]
    42. audioDuration = len(AudioSegment.from_file(audioPath))/1000
    43. audioSize = os.path.getsize(audioPath)/ (1024 * 1024)
    44. CorrectText = each["transcription"]
    45. tran_start_time = time.time()
    46. segments, info = fast_whisper_model.transcribe(audioPath, beam_size=5)
    47. segments = list(segments) # The transcription will actually run here.
    48. print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
    49. text = ""
    50. for segment in segments:
    51. text += segment.text
    52. cost_time = time.time() - tran_start_time
    53. print("Audio Path:", audioPath)
    54. print("Audio Duration (seconds):", audioDuration)
    55. print("Audio Size (MB):", audioSize)
    56. print("Correct Text:", CorrectText)
    57. print("Transcription Time (seconds):", cost_time)
    58. print("Transcribed Text:", text)
    59. worksheet[f"A{index}"] = audioPath
    60. worksheet[f"B{index}"] = audioDuration
    61. worksheet[f"C{index}"] = audioSize
    62. worksheet[f"D{index}"] = CorrectText
    63. worksheet[f"E{index}"] = text
    64. worksheet[f"F{index}"] = cost_time
    65. # break
    66. workbook.save("fast_whisper_output_data.xlsx")
    67. print("数据已保存到 fast_whisper_output_data.xlsx 文件")
    68. if __name__ == '__main__':
    69. unittest.main()

    使用whisper测试

    下载安装,参考官方网站即可,代码与上面代码类似

    测试结果可视化

    不太熟悉用numbers,凑合着看一下就行

    很明显,fast_whisper速度要更快一些

  • 相关阅读:
    卷积神经网络(CNN)简介
    使用FastReport报表动态更新人员签名图片
    与哈希函数有关的结构
    如何处理前端错误和异常?
    Hadoop运行模式
    nginx隐藏版本号及nginx
    腾讯大佬的“百万级”MySQL笔记,基础+优化+架构一篇搞定,秋招必看系列!
    JDK1.8Stream根据条件过滤出两个List集合中不一样的数据
    day31
    python numpy数组
  • 原文地址:https://blog.csdn.net/sunriseYJP/article/details/134232056