(If you would prefer to listen to this article, click this link to hear it using Amazon Polly. It will also be available in iTunes: search for LabR Learning Resources.)
Since I started writing on the Medium platform, I wanted to also provide an audio delivery format. However, having the time to sit down to record and edit the associated audio is something that is in short supply. I decided to turn to Polly from Amazon Web Services.
Polly is a text to speech platform capable of handling both plain text and Speech Synthesis Markup Language (SSML). This article isn’t going to cover SSML. Polly creates very lifelike audio from the supplied text in a variety of voices and dialects. When using plain text with Polly, the punctuation in the text is used as cues for pauses and breaks. Much more granular control over breaks, intonation and speed are available using SSML.
Once the text has been converted, it can be used over and over again in applications, or served from a storage location like Simple Storage Service (S3). Polly also supports real time audio streaming, so the text sent using the API is returned immediately for use in your application.
The Lambda Script
In my workflow, I write my article text on my iPad, and save the file to iCloud. I can then upload the file to a S3 bucket using the AWS Web console. The text-to-speech script is written in Python and executed by Lambda. An S3 event is sent when the file is created, which triggers the Lambda script.
The S3 object data is passed to the Lambda script in the event context. The script retrieves the object, reads the text from the file and submits a speech synthesis task to Polly using the API. The code uses the file extension to determine if the file is plain text (.txt) or SSML (.ssml), and adjusts the API call accordingly.
# Create the client to interact with the Polly API
polly = boto3.client("polly")
if extension == "ssml":
filetype = "ssml"
filetype = "text"
response = polly.start_speech_synthesis_task(
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
The code snippet above creates the connection to the Polly service, and submits the synthesis request using the provided configuration in the API call. Note that the output format and voice are fixed in the script, while the other parameters have been provided through the S3 notification event, or derived in the script processing.
There really isn’t anything else involved with using the API call. The remainder of the script deals with reading the file contents as Polly requires the text to be passed, not a file name, and defining the source and destination buckets. There are print statements in various part of the script so I can see what happened if there in an error by looking at the CloudWatch logs for the Lambda function.
Once the speech synthesis task is completed, the MP3 audio is written to a different S3 bucket, and a notification is sent to me using Simple Notification Service, so I can update the article with the URL to the audio.
There is a Little More to It
Writing text to read is not quite the same as writing text for audio conversion. I write my articles using Markdown, and then submit to the Medium platform for publishing. However, simply sending the markdown to Polly would result in the audio output saying “hash tag hash tag there is a Little More to it”, which just isn’t right. Once I am satisfied with the article, I make a copy and edit it to remove the markdown elements, headings and adjust the text to make it suitable for Polly to read, and easier for you to listen to.
After the Polly conversion process is complete and I receive the notification with the URL to the audio, I update the original article text and submit to Medium. I can then do a final review of the draft on the Medium platform including a check of the audio and then publish the story.
Currently my script only supports using the “Joanna” Polly voice and only in English. The next version will allow the use of a configuration file to select alternative voices, and languages when the file is submitted for processing.
Copyright 2019, Chris Hare