0

所以我有一个用例,我想将音频文件 (.WAV) 上传到 blob 存储中,该存储触发一个函数并从音频中获取文本。目前,唯一可能的方法是在本地保存音频文件。音频配置无法获取音频文件的 uri。我正在使用的代码是这样的:

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "sub-key", "westeurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_input = speechsdk.AudioConfig(filename="**BLOB URI**")

speech_recognizer = speechsdk.SpeechRecognizer(speech_config, audio_input)

result = speech_recognizer.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

根据我的研究,我们不能将 uri 作为文件名(代码的粗体部分)。像先在本地下载这样的解决方案是行不通的。

我尝试将音频作为流读取,但找不到转换为 AudioInputStream 的方法。

任何帮助都会很棒。谢谢。

4

1 回答 1

0

您可以使用批量转录REST API 操作来转录存储中的大量音频。您可以使用典型的 URI 或共享访问签名(SAS) URI指向音频文件并异步接收转录结果。使用 v3.0 API,您可以转录一个或多个音频文件,或处理整个存储容器。

请参阅以下内容:

https://medium.com/@abhishekcskumar/logic-apps-large-audio-speech-to-text-batch-transcription-d71e93bbaeec

https://github.com/PanosPeriorellis/Speech_Service-BatchTranscriptionAPI/blob/master/CrisClient/Program.cs

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#sample-code

于 2021-08-13T11:33:02.053 回答