Automatically creating low-cost audio
Hypothesis
We believe that …
Using the deep learning text to speech services we can create low-cost audio versions of written resources which would increase accessibility and engagement. It would also enable us to create podcasts of our content and maybe a viable alternative to expensive video creation.
How critical is this hypothesis?
Low
Aim
We aim to use this research to …
Create a proof of concept that embeds AWS Polly text to speech services in our standard WordPress offer enabling the automatic conversation of text to audio files. This would make the technology available to anyone who can edit our websites.
Test
To verify that, we will…
- Embed AWS Polly in a WordPress website
- Create example audio files based on existing content
- Demonstrate the use of several available voices
- Survey some test users to see if they would listen to the generate audio
Test cost:
Low – AWS have donated $250 of proof of concept credits to our account to run this experiment.
Test reliability:
Medium
Metric
And measure…
- Audio files can be automatically created
- The majority of test users agree that they would listen to and use audio outputs
Time requirement:
Low
Criteria
We are right if…
We can generate low cost audio files from text that users are happy to listen to.
Non-urgent advice: Experiment #004
Subject:
Automatic Audio
Scientists:
Chris Witham, Ian Randall, Tony Blacker
Experiment start:
03/08/2020
Experiment end:
28/08/2020
Analysis start:
30/08/2020
Duration:
4 weeks
Findings
- (Not yet complete)
Based on the Strategizer lean test card
Prior experiment with Polly.
Findings:
1. Audio is sometimes used by those without a visual impairment to listen to a report e.g. while on a commute but MP3 on demand can already be generated for that using online services like robobraille. Polly pre-prepared MP3s would save that task.
2. Audio generated by Polly is fixed speed. Those with a VI like to speed it up – they “haven’t got all day” to listen to everything on the web so <=1.5x is often used. Formats that feed into familiar text-to-speech tools are controllable e.g. kindle on an iphone. 3. Converting docs to audio doesn't solve accessibility if the writing doesn't also consider what it might sound like to listen to for a while.
Thanks Terry, some great thoughts. I’m definitely someone who would rather listen than read and definitely enjoy speeding things up. In our first proof of concept, we’ve got a syndicated feed setup which you will be able to digest into your podcast player of choice and use the audio speed controls there.
Definitely agree that the writing style will need to consider the audio format. “Download here” for example lacks meaning in audio. But contextual links are read well it seems.
Sample posts:
https://playground.nhsx.uk/2020/08/03/a-free-online-programme-to-develop-aspiring-leaders-yes-please/
https://playground.nhsx.uk/2020/08/03/when-bevan-finished-i-entered-a-period-of-grieving/
We’re also trying out the translation services too.
Podcast feed: https://playground.nhsx.uk/feed/amazon-pollycast
This has now been used to automatically generate an audio version of the People Plan.
https://digital.leadershipacademy.nhs.uk/people-plan-audio/
This took under 10 minutes to generate and cost $1.52.