![](/style/images/good.png)
![](/style/images/bad.png)
The experience of using Google Cloud’s Text-to-Speech AI
source link: https://donghao.org/2023/08/11/the-experience-of-using-google-clouds-text-to-speech-ai/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The experience of using Google Cloud’s Text-to-Speech AI
Just using the Python API of Text-to-Speech AI to transform a PDF file to mp3 audio, as the example:
from google.cloud import texttospeech from PyPDF2 import PdfReader client = texttospeech.TextToSpeechClient() reader = PdfReader("xxx.pdf") voice = texttospeech.VoiceSelectionParams( language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE ) audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.MP3, speaking_rate=0.8, ) text = "" index = 1 # try first 10 pages for page in reader.pages[:10]: text += page.extract_text() print(len(text)) synthesis_input = texttospeech.SynthesisInput(text=text) response = client.synthesize_speech( input=synthesis_input, voice=voice, audio_config=audio_config ) with open("outout.mp3", "wb") as out: out.write(response.audio_content) print("Written")
from google.cloud import texttospeech
from PyPDF2 import PdfReader
client = texttospeech.TextToSpeechClient()
reader = PdfReader("xxx.pdf")
voice = texttospeech.VoiceSelectionParams(
language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=0.8,
)
text = ""
index = 1
# try first 10 pages
for page in reader.pages[:10]:
text += page.extract_text()
print(len(text))
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
with open("outout.mp3", "wb") as out:
out.write(response.audio_content)
print("Written")
Very simple, right? But it just reported an error:
google.api_core.exceptions.InvalidArgument: 400 Either `input.text` or `input.ssml` is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.
google.api_core.exceptions.InvalidArgument: 400 Either `input.text` or `input.ssml` is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.
It seems the request is too long. Let’s use the “Long Audio API”:
from google.cloud import texttospeech from PyPDF2 import PdfReader client = texttospeech.TextToSpeechLongAudioSynthesizeClient() reader = PdfReader("xxx.pdf") voice = texttospeech.VoiceSelectionParams( language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE ) audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.LINEAR16, speaking_rate=0.8, ) text = "" index = 1 for page in reader.pages[:10]: text += page.extract_text() print(len(text)) synthesis_input = texttospeech.SynthesisInput(text=text) request = texttospeech.SynthesizeLongAudioRequest( parent="projects/robin-00000/locations/us", input=synthesis_input, voice=voice, audio_config=audio_config, output_gcs_uri="gs://robin_tts/xxx.mp3" ) operation = client.synthesize_long_audio(request=request) result = operation.result(timeout=300) print(result)
from google.cloud import texttospeech
from PyPDF2 import PdfReader
client = texttospeech.TextToSpeechLongAudioSynthesizeClient()
reader = PdfReader("xxx.pdf")
voice = texttospeech.VoiceSelectionParams(
language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
speaking_rate=0.8,
)
text = ""
index = 1
for page in reader.pages[:10]:
text += page.extract_text()
print(len(text))
synthesis_input = texttospeech.SynthesisInput(text=text)
request = texttospeech.SynthesizeLongAudioRequest(
parent="projects/robin-00000/locations/us",
input=synthesis_input, voice=voice, audio_config=audio_config,
output_gcs_uri="gs://robin_tts/xxx.mp3"
)
operation = client.synthesize_long_audio(request=request)
result = operation.result(timeout=300)
print(result)
It couldn’t work still:
google.api_core.exceptions.InvalidArgument: 400 The long audio API does not support the language zh. Supported languages: en, es.
google.api_core.exceptions.InvalidArgument: 400 The long audio API does not support the language zh. Supported languages: en, es.
Okay. It doesn’t support the Chinese language. Then, what should I do if I want to translate a Chinese pdf to mp3? Convert them page by page into 500 mp3 files? This is terrible. Even for the short mp3 it generated, it definitely sounds like a machine, not a human.
Google has the state-of-the-art technology of deep learning but some of their products in the cloud are ridiculously hard to use (such as Vertex AI, and this Text-to-Speech).
After some searching (at least Google search is perfect as before), I found this NaturalReader. Surprisingly, it supports the Chinese language and the voice is as well as a real human. The only problem is it is very expensive for individual users.
Related Posts
- First experiments about Vertex AI of Google Cloud
As the above menu show in the Vertex AI, it is trying to include all…
- Google Cloud Summit 2019
Yesterday I joined the Google Cloud Summit 2019 in Sydney. The meeting place is quite…
- Some test samples for Text-To-Speech solutions
I am doing some research on TTS (Text-To-Speech) recently and noticed three almost state-of-the-art and…
August 11, 2023 - 0:33
RobinDong
industry
python
Leave a comment
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK