Convert recorded audio to text

One way to make your podcast, screencast or online video more accessible is to convert it to text. For authors, this means providing a transcript, subtitles or close caption so that the hearing impaired get the same context of the audio. The challenge is most content recorded is done without being planned, organized, or rehearsed. Where it gets tricky is most talk to text software programs require you to speak through a microphone and there is not an obvious way to route speech from a recorded file to the program that converts it.

If we combine Enhanced Dictation feature in MacOS X 10.9 (Mavericks), with Audacity and Soundflower it can easily be integrated. So at a high level, audacity will output audio to enhanced dictation while using soundflower.

  • Audacity (output) -> Soundflower -> Dictation (input)

Lets see how it is done.

Download and install soundflower

Soundflower is a OS X system extension that allows applications to pass audio to other applications. It presents itself as an audio device, allowing any audio application to send and receive audio with no other support needed. Soundflower is free, open-source, and runs on Mac Intel and PPC computers. Download soundflower

images/posts/convert-recorded-audio-to-text/soundflower-download.png

Modify dictation settings

Dictation, similar to SIRI, uses voice instead of typing text for OS X Mountain Lion. You will want to make three modifications to direct recorded audio to dictation.

Navigate to Dictation preferences in the control panel

  1. Choose Apple () > System Preferences.
  2. From the View menu, choose Dictation & Speech.
  3. Click "On" to turn Dictation on
  4. Check 'Enhanced dictation', Enhanced Dictation allows you to dictate without an active Internet connection.

    a. The first time you turn on Enhanced Dictation, OS X downloads additional content that allows Dictation to work offline. Download enhanced dictation

  5. Change input device to Sunflower 2ch

Dictation speech to text settings]

Prepare audacity

Audacity, is an open source audio editor for recording, slicing, and mixing audio, you select soundflower as an output device.

  1. Download and install Audacity
  2. Import recorded audio into audacity
  3. Change output device to soundflower (2ch)

Audacity settings]

Bringing it together

Next open up your favorite word editing program (text edit, pages or ms word).

  1. Open audacity and the word program side by side
  2. Quickly start audio file in audacity
  3. Flip over to your word editing program and turn on dication by pressing the fn (Function) key twice, or choose Edit > Start Dictation.

You won't hear any audio but you will see activity in dictation microphone along with the text appear in the word editing program. It should look something like this:

Audacity sunflower dictation

Screencast

Raw audio to text conversion

Mentioned in the video, here is the raw text that this process produced:

Is Justin from level of lunch were just getting ready to make a blog post on how to convert recorded audio to text on the screen cast as not to focus on downloading and installing each one of the the different components are the pieces that you need to do this it's really going to focus on the settings and then kind of the putting it all together and making it work let's see how we can do that verse forgot to go up to Apple system preferences and open up dictation and speech and there's three different pieces we need to make sure that we selected first we need to turn dictation on next we need to look at in and have selected the use the enhanced dictation in the third we need to select sound flower as an input device can't next were going to close this down and then open up the audacity and audacity has one setting that we need to change and that's the output device to preselect sound flower to channel it'll actually type the sound from the recording to dictation and then I will go through the process of doing we Vaara have a a prior screen cast the running important to show how that works so I will import that into audacity to import that into audacity that'll take just a second and then will jump over to our text added and then turn on dictation wallet place this is work it's a little trickier you got to move a little bit fasters once you play the recording you need to turn on dictation as soon as possible to pick up all of the other audio that's getting played order to make this jump. We'll press play jump over to TextEdit hitter function key twice and then it'll begin to actually transcribe on that's one of the Porky things you got to make that transition very that you can see it's going over the audio and producing the text from that audio and as if you can read through it and see that it's not always perfect don't get pretty good but then you'll have to go in and make edits on two paragraphs and commas and punctuation and all that good stuff the one thing to note 2 is once you move off of the text editor your word editing program it will automatically stop dictation so it's continuing to working here picking up the audio from audacity but once I go over and say I click on another window it's Ardi completed so it'll stop that dictation I'll it's going to go look at that so it it stopped it once I clicked on another window so just be aware that as you're you're doing a conversion or or you have other things that you're working on during that conversion process when thing I'll do is for the screen cast I will convert it in raw format so you can listen to the screen cast as well as see what it dictation produced through this process extra joining

Wrap up

Just with any speech to text software, it isn't perfect. Expect to spend time editing the text produced to get it just how you like it. By default it doesn't provide punctuation, line breaks, breaks on paragraphs or inserts example code in our screencasts :). Dictation does understands basic text-related commands such as “all caps,” “new paragraph,” “new line,” and punctions such as “period,” “comma,” “question mark,” or “exclamation point” but most likely these commands will not be contained within your recording.

A couple other things to be aware of:

  • When starting the audio file in audacity you have to quickly jump over to word editing program to kick off the process.
  • During the conversion process you need to make sure you maintain context of the word editing program otherwise dictation will stop listening when you switch to another window.