.Net Core + AWS Polly
Hi folks,
As usual I would like to thanks to spending time reading this article, which i’m about to explain how to integrate my favorite programming language with this awesome text-to-speech API

First of all, lets talk about AWS Polly and according to amazon’s documentation, they say that this cloud service is responsable to convert text into lifelike speech helping us to delivery a accessable application to our users allowing us to adjust the language, the gender type, even some accent like british or american one.
They split this service in to models:
- NTT (Neural technology)
It’s only avaiable in a few language, but the point here is to provide a even more human like voice
- STT (Standard)
This one provides a lot of language types and works really good
The pre-requesities are:
- .Net Core 3.1 (its not avaiable yet the .NET 5 for AWS sdk)
- AWS SDK Tool Kit
- Visual studio or visual studio code (but in this example i choose the first one)
- AWS Account (Mine is a dev type one)
Now let’s create a .net core console application and you can choose any name you want:

After the creation of the project, we have a simple class, Program.cs, which is responsable to run the application.
Let’s import some packages through Nuget:
Install-Package AWSSDK.Polly -Version 3.5.1.29
I already did that, so just be sure you install as follow:

Configured the package, the amazon documentation highly recommends to setup a different user from your root user, so i did that using the AWS Console Manager:


As you can see, after select IAM option, you are redirectly to Identity and Access Management board, so you just need to follow some “next, next, next” buttons, and you will create a user.
I will discuss in another article how can we correctly separate the users and how to set the right access to them.
Now let’s back to our code, after you create an user, will be given to you a client key and a secreat Key which will be necessary to continue.
(Dont worry about the full code, it will be displayed on the end of the article, such as the repository from github)
First I store the credentials by initialising a BasicAWSCredentials, then we must assign the credentials to the aws client like below:
PS: If you read the documentation, the best practice is to store them within an aws config file, but for the proposal of the reading, whatever lol
And I did this lame thing to catch something from the console hahaha, you can pass the text from arguments if you like, i just though that it will be quicker.
However i found some issues searching the aws documentation, kinda outdated you now?! And we dont have a lot of examples such as azure documentation, but the methods to implement are pretty intuitive and does has some logic.
Just like these:
Every method inside the polly client expect a request, in this case a SynthesizeSpeechRequest() returning a SynthesizeSpeechResponse()
The big bonus in using this API is that you don’t have to worry about turning a .wav file into .mp3, which saves effort and joining third party libraries like NAudio, you just have to say the OutputFormat and vouilà.
Another thing that its important to say, its about the incredible fast response from the aws API and low latency.
Now that we have the response, I made a class responsable to save the content in a file:
I don’t know if you guys notice, a bit up, when the request is sended, one of the attributes was something call “Lexicons”
A short briefing about this fellow, you can create any lexicon with any alias that you want and store them in a XML document, for example:
And send them to your aws account, with this method:
Get from the API:
Reading AWS documentation:
Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.
Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).
I hope you all have enjoyed this post!
In my github you can find the complete code of what we did here: