python-3.x - 为什么 Azure 的 Speech To Text 这么慢？

Question

我正在使用 Azure Speech To Text API 来识别从 10 秒到 1 分钟的小型语音录音。每次语音识别大约需要 5 秒才能完成，这似乎有点太多了！

这是我的做法：

speech_config = speechsdk.SpeechConfig(subscription=speech_key, 
                                   region=service_region, 
                                   speech_recognition_language=language)
audio_config = speechsdk.audio.AudioConfig(filename=filepath)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, 
                                               audio_config=audio_config)

result = speech_recognizer.recognize_once()

我试图确定瓶颈，使用timeit：

print(timeit.timeit(lambda : speechsdk.SpeechConfig(subscription=speech_key, 
                                     region=service_region, 
                                     speech_recognition_language=language), 
                    number=100))
>>> 0.004
print(timeit.timeit(lambda : speechsdk.audio.AudioConfig(filename=filepath), 
                    number=100))
>>> 0.003
print(timeit.timeit(lambda : speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config),
                   number=100))
>>> 0.118

print(timeit.timeit(lambda : print(speech_recognizer.recognize_once()),
                   number=5)) # Only doing this 5 times because it's very slow
>>> 35.01

我实际上使用了一个包装函数来重新初始化 Speech_recognizer，因为在它上面调用识别（）使它不可用。

在这个实验中，转录一段 11 秒的录音大约需要 7 秒。

我正在将音频文件转录为法语，使用service_region = "westeurope"

score 1 · Accepted Answer

如果音频长度为 10s，识别需要 5s。

这似乎仍然是合理的。RTF 为 5/10 = 0.5

语音识别是一个繁重的过程，需要时间让算法和模型运行

python-3.x - 为什么 Azure 的 Speech To Text 这么慢？

1 回答 1

Related

Reference