Audio To Text

Over the last few years, I’ve tried to look at lots of different ways to turn audio into text – ideally cheaply or even freely.
Working for a radio station, even one that largely plays out music, being able to search audio to find when a presenter said or mentioned something would be incredibly useful.
I’ve seen a very expensive product tested a couple of times which has been pretty fitful in working well – and even though one radio group did buy it one stage, it’s never been fully utilised. There are also services like Blinkx that seem to do a pretty good job in a controlled environment.
A regular example seems to be based around the BBC News channel. The transcription of what presenters are saying is remarkably accurate – even for far flung place names. But systems can be pointed towards the BBC News website for appropriate names and words to help those trickier phrases.
So it was interesting to see two different takes on this problem in the last week or so. First of all Andy Baio has published details of how he went about getting an interview he’d conducted transcribed so that he could place it on his website. Essentially he chopped the interview into small nuggets, and then used Amazon’s Mechanical Turk to get transcriptions of what was said. He’s very happy with the outcome.
A couple of years ago, Virgin Radio tried something very similar with our “Snoop Log.” Everytime a DJ opens a fader, we record what he or she is saying. That’s put into a database alongside details of track listings and adverts so that we have a record of what was played out. If we could also get hold of a transcription of what was said, we’d have a fully indexable database of our output.
DJ links tend to be pretty short – often well under a minute. So the individual “chunks” are ready made. It’s easy to transcode them to mp3 or whatever would be appropriate. The difficulty comes from the song titles and artists. If you live in India and English is perhaps a second language, then the exact spelling of “The Kings of Leon” might be tricky for you.
The test wasn’t a success. Now it might be that we didn’t offer enough cash to get better quality translators, or perhaps if we’d embedded the audio in a Flash player, that might have helped. One way or another – we didn’t take it forward.
There are other transcription services that ride on the back of the Amazon Mechanical Turk and cost more than the DIY option. But then they offer higher quality output. It’s a question of cost for a commercial radio station versus value of the output. It’s certainly something to revisit.
The other fascinating development has come from Google and its “Gaudi” service which has just launched. Initially concentrating on political speeches, the service allows you to search for words within those speeches and jump to the correct part of the video.
Now obviously from a radio perspective, this could be done just as easily with an audio only stream.
But I’d still love to know to what extent the service is only using audio. It’s quite clear that pretty much every political speech is captured in text form in one place or another. That’s what allows talented souls to put together videos of politicians “singing along” to songs like “Never Gonna Give You Up.” So is Google using text alongside video/audio to pattern match?
Anyway, it’s promising, and surely in time, we’ll truly be able to search audio.


Posted

in

Tags: