Hacking Google’s Text To Speech “API”

April 26, 2013 | 3 Minute Read

When I was at my previous job, one task I had was localizing a large set of phrases to multiple languages, both in text and audio files. I did this by using the awesome Google Translate API.

The Google Translate website has features for translating text and playing audio of it in the translated language. There’s no official API for getting audio, though. Luckily, I’ve never let a lack of an official API stop me before. I had read a few old blog posts about how Google’s undocumented TTS API could be used, albeit with a 100 character limit. Going over 100 characters would result in a truncated audio file. Some of the text I needed to output to audio was longer than that. It turns out that with a little bit of Chrome web inspector, I could replicate the functionality of the Google Translate site.

The first thing to check out is the url scheme of the audio files, which looks like this:

http://translate.google.com/translate_tts?ie=UTF-8&q=hello%20world&tl=en&total=1&idx=0&textlen=11&prev=input

Breaking down the parameters, “ie” is the text’s encoding, “q” is the text to convert to audio, “tl” is the text language, “total” is the total number of chunks (more on that later), “idx” is which chunk we’re on, “textlen” is the length of the text in that chunk and “prev” is not really important.

The Google Translate site itself gets around its own character limit by breaking big blocks of text into “chunks”. It seems to try and break along punctuation, but for super long sentences it will also break in the middle of a sentence, which ends up sounding pretty weird. Using the Gettysburg Address as an example, Google makes a request for the chunk “civil war”.

In order to download audio files for longer chunks of text, I wrote up a python script that broke the text down and made separate requests to Google. The script would write all of the files to one file, and somehow, it worked! Just to be safe, I also set my script up to use Google’s Flash player as the referer (sic) and set the user agent to a version of Firefox.

At the time, I didn’t want to release the code as it was being used for some uber top secret stuff. But since I’m not working on that project anymore, I refactored the original code into a command line Python script. Along the way I had to learn how to use Python’s argparse, which is a pretty neat way of parsing command line arguments.

The project is available on Github right now, so go grab it and try it out. If you’re curious what the output sounds like, here’s a recording of female Abraham Lincoln reciting the Gettysburg Address (yes, she mispronounces some words). One fun thing to try out is outputting clashing input and output languages. Here’s Female Japanese Abraham Lincoln reciting the same speech (she just seems to be spelling words, slacker).

If you enjoyed this hack, let me know and I could post some other ones I’ve been working on. And if you find a way to improve the code (probably not difficult at all) go ahead and submit a pull request on Github. And if you’re from Google, please don’t shut down my Gmail and Adsense accounts.

Comments

Tom Cox May 15, 2013 at 01:44 AM

This will be enormously helpful for people who want to talk instead of type, for example children and senior citizens who find pecking keys frustrating. I have several ideas to help non-writers write, which could be the next big thing in publishing: people telling/writing stories instead of sending text messages.
zaggynl May 29, 2013 at 09:04 AM

It works but the sound produced is..off. Command used: c:\Python27\python.exe GoogleTTS.py -f englishpoem.txt -o englishpoem.mp3 Text taken from: http://www.i18nguy.com/chaos.html Result: https://mega.co.nz/#!fR9VAIbD!X2NAI1daKrkpLVAGW86gpGAmHiYKd94zdkAQ6gVcsGI
umer June 20, 2013 at 03:38 PM

Can you choose gender
Hung June 26, 2013 at 04:07 PM

I’m getting the following error in python 3.3 on windows “Traceback (most recent call last): File “GoogleTTSv2.py”, line 6, in import urllib, urllib2 ImportError: No module named ‘urllib2′” I cannot find urlib2 for python 3.3 anywhere.
- Hung October 02, 2013 at 06:11 PM
  
  Oops. I think I replied via email but I’ll also reply here: I wrote this for Python 2.7, so you’ll have to change things around for 3.3. I think you’ll need to use urllib in 3.3 instead of urllib2. The methods may also have changed
Mike July 14, 2013 at 07:01 PM

This is really helpful! I’m also interested in choosing gender too, though. You had a female voice reading the Gettysburg Address, but the script produces male output only. Can we change this?
- Leon July 24, 2013 at 08:23 AM
  
  Try this for female GoogleTTS.py -l en_us -f text.txt
Atul July 18, 2013 at 03:39 PM

Hey,Is it working right now. I am unable to download. Has google stopped it from downloading?
Dmitri DB August 02, 2013 at 04:57 PM

Werd, you slapped something together, glad to see some other python hacker wanted to get something like this going! I was messing around on something similar that uses the Natural Language Toolkit ( http://nltk.org ) to do fully proper multilingual sentence tokenization, but the project stalled cause I got more important work to do. Will let you know if I ever get back to it.
Joe Suber September 05, 2013 at 10:14 PM

Thanks to you, Python, and Open CV, I am able to help a blind guy see his Magic Cards without revealing them to his opponent (for the first time in years) I also am setting my project up in my store so people can identify cards for sale. I’ve spent a lot of time on the CV part and thanks to you I have avoided a lengthy delve into the very API you used for voice!! I have a working project now, soon to be up on github once I get the knack of that. (I’m not a professional software guy, but getting up to speed in a few areas) I also considered ‘speeks’ for locally doing the voice-synth, but this web-based approach actually ends up with less overhead, and it sounds better too. I didn’t expect latency to be so low. Open source FTW!
- Hung September 19, 2013 at 04:23 PM
  
  Whoa, that sounds really cool. I’m glad my code could help you build your app!
  - joesuber September 27, 2013 at 01:38 AM
    
    I still don’t have the github repo up, but I did manage to get a rigged up installer made along with some other improvements. I am afraid it will seem amateurish to any pros, but it works pretty well once you get it installed. Check README for directions. Since your text-to-speech thing is such a cool addition to my project I’ll put the dropbox folder link here for you or whoever delves this deeply into the comments of your blog. https://www.dropbox.com/sh/lxl3a3rlaaxl5mo/Xn1-jDYJUa You don’t have to download all 3.5 gigs of card images unless you want the complete Magic: The Gathering catalog. Just copy the whole HydraMatic directory (minus card sets you don’t want in ‘pics’ folder) and from there run ‘python crabby.py’
    - joesuber September 27, 2013 at 01:47 AM
      
      and it would help to have some cards to identify…
  - joesuber September 29, 2013 at 02:08 AM
    
    https://github.com/JoeSuber/Google-Text-To-Speech Got git-hub fork of your thing put up. Git is like learning a new OS.

Comments are moderated and won't appear immediately after submission.

« Google Reader and Skating to Where The Puck Used To Be Open Sourcing Stuff on Github »

Hung Truong: The Blog!

Hacking Google’s Text To Speech “API”

Comments

Leave a Comment

Thank you for your comment!

Submission Error