Amazon Polly: Bringing Audio to my Medium Articles

(If you would prefer to listen to this article, click this link to hear it using Amazon Polly. It will also be available in iTunes: search for LabR Learning Resources.)

Since I started writing on the Medium platform, I wanted to also provide an audio delivery format. However, having the time to sit down to record and edit the associated audio is something that is in short supply. I decided to turn to Polly from Amazon Web Services.

Polly is a text to speech platform capable of handling both plain text and Speech Synthesis Markup Language (SSML). This article isn’t going to cover SSML. Polly creates very lifelike audio from the supplied text in a variety of voices and dialects. When using plain text with Polly, the punctuation in the text is used as cues for pauses and breaks. Much more granular control over breaks, intonation and speed are available using SSML.

Once the text has been converted, it can be used over and over again in applications, or served from a storage location like Simple Storage Service (S3). Polly also supports real time audio streaming, so the text sent using the API is returned immediately for use in your application.

In my workflow, I write my article text on my iPad, and save the file to iCloud. I can then upload the file to a S3 bucket using the AWS Web console. The text-to-speech script is written in Python and executed by Lambda. An S3 event is sent when the file is created, which triggers the Lambda script.

The S3 object data is passed to the Lambda script in the event context. The script retrieves the object, reads the text from the file and submits a speech synthesis task to Polly using the API. The code uses the file extension to determine if the file is plain text (.txt) or SSML (.ssml), and adjusts the API call accordingly.

#
# Create the client to interact with the Polly API
#
polly = boto3.client("polly")

try:
if extension == "ssml":
filetype = "ssml"
else:
filetype = "text"

response = polly.start_speech_synthesis_task(
Text=sourceText,
TextType=filetype,
OutputFormat="mp3",
VoiceId="Joanna",
OutputS3BucketName=bucket,
OutputS3KeyPrefix=prefix,
SnsTopicArn=topicid)
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
print(error)
sys.exit(-1)

The code snippet above creates the connection to the Polly service, and submits the synthesis request using the provided configuration in the API call. Note that the output format and voice are fixed in the script, while the other parameters have been provided through the S3 notification event, or derived in the script processing.

There really isn’t anything else involved with using the API call. The remainder of the script deals with reading the file contents as Polly requires the text to be passed, not a file name, and defining the source and destination buckets. There are print statements in various part of the script so I can see what happened if there in an error by looking at the CloudWatch logs for the Lambda function.

Once the speech synthesis task is completed, the MP3 audio is written to a different S3 bucket, and a notification is sent to me using Simple Notification Service, so I can update the article with the URL to the audio.

Writing text to read is not quite the same as writing text for audio conversion. I write my articles using Markdown, and then submit to the Medium platform for publishing. However, simply sending the markdown to Polly would result in the audio output saying “hash tag hash tag there is a Little More to it”, which just isn’t right. Once I am satisfied with the article, I make a copy and edit it to remove the markdown elements, headings and adjust the text to make it suitable for Polly to read, and easier for you to listen to.

After the Polly conversion process is complete and I receive the notification with the URL to the audio, I update the original article text and submit to Medium. I can then do a final review of the draft on the Medium platform including a check of the audio and then publish the story.

Currently my script only supports using the “Joanna” Polly voice and only in English. The next version will allow the use of a configuration file to select alternative voices, and languages when the file is submitted for processing.

AWS Polly

SSML Tags Supported by AWS Polly

AWS Python SDK

AWS SSML Reference

Python

Copyright 2019, Chris Hare

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store