What it Does
Based on some amazing work by my friend Weston Ruter, I’ve put together a little library that mashes together
- some text (usually some HTML)
- an audio source reading that text (usually an mp3)
- a timing file (in this case, generated by CMU Sphinx)
The result is that when you press “play” the words are highlighted as they are read, and you can click on words to navigate through the audio. The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.
I put together two demo versions, one of Martin Luther King, Jr.’s I Have A Dream speech and another one of the English Bible using the English Standard Version which has as great API. Unfortunately, the MLK speech didn’t align very well so the demo isn’t very good other than as an example of how dependent the process is on a good alignment.
(note: right now it’s Chrome/Safari/IE9 only since it requires MP3 playback)
How it Works
Although I wanted to use a “standard” format like WebVTT, I also wanted the filesize to be compact since my intended project involved large datasets of 48 hours or more of audio (i.e. the Bible). So here’s the basic JSON format:
{"words":[ ["in",0.03,0.18], ["the",0.18,0.28], ["beginning",0.28,0.88], ["god",0.88,1.35], ["created",1.35,1.93] ]}
Basically, it’s just an array of words with a start and end time. The array of arrays format is quite a bit smaller than using JSON and doesn’t require any processing like WebVTT (although that might change later). It would take quite a bit of time to produce something like this by hand, but Weston used the CMU Sphinx library to generate this data, and it’s probably been about 90% accurate for the entire ESV Bible.
Once all the data is loaded, the AudioAligner class searches through a DOM node for the words in the array, skipping over classes or tags you define, and then links those words to the audio player.
Demo
Again, the demo I put together utilizes the API provided by the creators of the English Standard Version (ESV) of the Bible. The API allows developers to request the text and the MP3 and then this is mashed up with the timing files generated with SMU Sphinx.
HTML5 Karoke Demo
If anyone’s interested in the library, please let me know in the comments and I’ll post it to Github.
I’m definitely interested and I’m looking to participate to help on some Bible web/app projects
btw, this is awesome 🙂
Very cool! Yesterday I was reading up on web audio and ran across an experiment by the author of jPlayer that had some similarities, but it was doing manual audio syncing. I can’t speak to the underlying code, but the demo was fun to fiddle with, particularly using the text to navigate or as a soundboard, and the visualization bit was also nice.
http://happyworm.com/blog/2010/12/05/drumbeat-demo-html5-audio-text-sync/
Yes! Please post the code to Github. I can see this being very useful for playing hymns + words – a hymn karaoke, sort of. Do you have any idea if CMU Shpinx works on other languages?
Thanks.
Hi:
Very cool. Would you mind sharing on github?
Thanks
Alan
Am I interested? AM I? This is amazing – I’m all over it. In fact I wanted to do something myself using CMU Sphinx. Please do put in on github – great work and thanks!
I totally hear you on the timing side of things.. phew, we blew a massive amount of money last year on R&D to build this very tool in Flex.. we basically tried to use Flex to analyse the audio graph and “cleverly” plot the words as it heard it in a fashion where you could then “make minor adjustments” to the plotted words on the audio graph.. needless to say, it is a VERY hard thing to get right and we eventually canned it after trying out existing hardware accelerated timing apps. We did end up using it for some client work, but it was so frustrating to work with. You can see it in action here: http://www.readright.co.za/stories/2009/11/jasper-an-outing-to-the-aquarium-read-along/
So, note to all, this line is gospel: The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.
Would love to play with this! Did you get a chance to post it to github?
Thanks!
I am very much interested in library.
I am very interested in your library and would love to work with the source code for a project of mine. Will you be making the source code available to the public? I look forward to hearing from you and learning more about the possibilities of this tool. Thank you. Sincerely, Gary.
Hello, I’m very interested in you project. Can you make it available to me? I would like to use in a project of mine.
I am very interested in your library, how do i get it from github?
Amen brother, please send me the link on github with the library…thanks a bunch, Vick
Please brother send me the github link its incredible.
This is very helpful for one of the projects I am working on. Can you please provide me with the link?
Here’s the Github link – https://github.com/johndyer/audiosync
I would like to develop a website where users can upload their audio prayers to be listened to by later users as a online group prayer session, The text and audio of the prayers scroll across the screen as multiple international users enunciate the words of the prayer at the same time based on the cadence provided by the karaoke api. I need to develop an algorithm that will automatically determine if the upload is not malicious, accurate, and safe for a holy website, What are your thoughts?
Greg
Hi, I’m interested in your solution. I teach music and I’m looking for tools that help me out with getting better results.
Could you share your code please?
Thanks!