语音合成(TTS)应用方案一二三

前言

语音合成是指将文字转成语音(TextToSpeech)进行播放，本文主要介绍三种实现方案。

一、web端实现

通过h5提供的网页语音API来实现，用到speechSynthesis(获取设备上可用的合成声音的信息、控制声音播放、暂停等命令)和SpeechSynthesisUtterance(管理文本声音的属性的实例)。


export default class Speaker {
  speaker: SpeechSynthesisUtterance;
 
  constructor(txt = '', volume = 1, lang = 'zh-CN') {
    this.speaker = new window.SpeechSynthesisUtterance(txt);
    this.speaker.volume = volume;
    this.speaker.lang = lang;
  }
 
  /* 设置文本 */
  setText(txt) {
    this.speaker.text = txt;
  }
 
  /**
   * 设置音量
   * @param volume 区间[0, 1]
   */
  setVolume(volume) {
    this.speaker.volume = volume;
  }
 
  /* 播放 */
  speak() {
    window.speechSynthesis.speak(this.speaker);
  }
 
  /* 停止播放 */
  stop = () => {
    window.speechSynthesis.cancel();
  };
}

注意speaker.lang如果不指定中文，在非中文环境下无法播放中文。

SpeechSynthesisUtterance包括如下属性

SpeechSynthesisUtterance.lang 语言种类，默认为html里lang的值

SpeechSynthesisUtterance.pitch 音高，默认为1，取值区间为[0, 2]

SpeechSynthesisUtterance.rate 速率，默认为1，取值区间为[0.1, 10]

SpeechSynthesisUtterance.text 文本内容

SpeechSynthesisUtterance.voice 声音对象，从SpeechSynthesis.getVoices()中选取值进行设定，默认是与lang最匹配的一个

SpeechSynthesisUtterance.volume 音量默认为1，取值区间为[0, 1]

二、node实现

windows环境中，powershell可以调用系统的语音api


Add-Type -AssemblyName System.speech;
$speak = New-Object System.Speech.Synthesis.SpeechSynthesizer;
$speak.Rate = 0; # 语速 [-10, 10]
$speak.Speak('语音合成')

只要通过node执行这个指令即可，因为powershell默认是GBK编码，要注意编码转换


const { exec } = require('child_process');
const iconv = require('iconv-lite');
 
exec(`powershell.exe Add-Type -AssemblyName System.speech; $speak = New-Object System.Speech.Synthesis.SpeechSynthesizer; $speak.Rate = 5; $speak.Speak([Console]::In.ReadLine()); exit`).stdin.end(iconv.encode('语音合成', 'gbk'));

三、调用第三方SDK

以上两种方法都是基于系统的支持，声音有明显的机器感，如果需要更加逼真的人声发音，就需要通过第三方服务来完成，BAT及科大讯飞都有提供语音合成的服务，基本都是按调用量收费。

这些服务基本都会提供web端、node端、移动端、小程序等各个平台的解决方案

具体实现可参考相应的官方文档，如接口说明 - 智能语音交互 - 阿里云、语音合成（流式版）WebAPI 文档 | 讯飞开放平台文档中心

相关阅读:
FFmpeg入门详解之110：RTSP协议讲解
0913 理论知识，项目
Matlab|基于多目标粒子群算法的微电网优化调度（多约束多目标智能算法模板）
非分布式-多线程事务控制核心代码1
438. 找到字符串中所有字母异位词
XUbuntu22.04之查找进程号pidof、pgrep总结(一百九十)
集合类中的反常规特性
mysql 事务及 Spring事务初论
Scala开发环境搭建
判断JS是否加载完成

原文地址：https://blog.csdn.net/cscj2010/article/details/126766102