我正在使用 Azure Speech To Text API 来识别从 10 秒到 1 分钟的小型语音录音。每次语音识别大约需要 5 秒才能完成,这似乎有点太多了!
这是我的做法:
speech_config = speechsdk.SpeechConfig(subscription=speech_key,
region=service_region,
speech_recognition_language=language)
audio_config = speechsdk.audio.AudioConfig(filename=filepath)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,
audio_config=audio_config)
result = speech_recognizer.recognize_once()
我试图确定瓶颈,使用timeit
:
print(timeit.timeit(lambda : speechsdk.SpeechConfig(subscription=speech_key,
region=service_region,
speech_recognition_language=language),
number=100))
>>> 0.004
print(timeit.timeit(lambda : speechsdk.audio.AudioConfig(filename=filepath),
number=100))
>>> 0.003
print(timeit.timeit(lambda : speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config),
number=100))
>>> 0.118
print(timeit.timeit(lambda : print(speech_recognizer.recognize_once()),
number=5)) # Only doing this 5 times because it's very slow
>>> 35.01
我实际上使用了一个包装函数来重新初始化 Speech_recognizer,因为在它上面调用识别()使它不可用。
在这个实验中,转录一段 11 秒的录音大约需要 7 秒。
我正在将音频文件转录为法语,使用service_region = "westeurope"