Speech Recognition Software

Most think of speech recognition software as being a space-age typing aid. This is true, however, it can also control the computer, making the mouse and keyboard almost redundant. Read on…

The 2 main software titles that have been around for some years is Nuance’s Dragon Naturally Speaking and IBM’s ViaVoice. Now days, Nuance’s Dragon Naturally Speaking (DNS) has taken hold of the market to become the predominant player. At the time of writing this article, the prices start at around £50 and travel up to around £150 for the Preferred version that has some extra bells and whistles added to it, though the basic workings and accuracy are the same throughout the range.

Free Version

Since the introduction of the Windows Vista operating system, Microsoft have included their own speech recognition within Windows, Windows Speech Recognition (WSR). This means that if you have a PC that is running either Windows Vista or Windows 7, it is hiding away on your computer ready to be used.I say hiding, as Microsoft haven’t made a song and a dance about its existence, so most people aren’t aware that they have it. To find it, follow this path:

Start Menu – All Programs – Accessories – Ease of Access – Windows Speech Recognition.

What can speech recognition do?

For many, speech recognition has been seen as a way of inputting text into such applications as word processing documents to write letters, reports or essays, or to dictate text into emails and instant messaging programs such as Skype and MSN. This is indeed true, however, many do not realise that speech recognition software can be used to control the computer, taking over tasks from the mouse and keyboard.

This last point is of particular interest if a person struggles with using a mouse and keyboard for whatever reason, whether it be down to problems with coordination, tremors or weakness in the muscles that a person uses to control a mouse (or other pointing device) and keyboards.

The Training

The ability to use speech recognition does heavily rely on a person’s quality of speech. Stammering, very low volume and other speech impairments can result in a lack of accuracy and therefore an increase in frustration at the results. It helps to always use the same microphone and position it at exactly the same place each time you use the software.

To get the best accuracy out of speech recognition, it is recommended that each user of the software undergo training. This is a way for the speech recognition software to learn how you speak and therefore make adjustments for such things as regional accents. The software can also scan your text documents and emails to learn your use of English to help it work out the most likely words it hears in context of the words around it.

Training in Dragon Naturally Speaking (DNS) takes 10 to 20 minutes for most and involves a volume and quality check to make sure the microphone setup is correct and it is hearing you correctly. Next, you are asked to dictate around 12 paragraphs from the screen, ‘parrot-fashion’, whilst a pointer indicates the point to which it has understood you within the text. At any time, you can pause the training process so you can take a rest, though this is also helpful for people with reading issues, allowing a second person to read sections of text at a time whilst on pause for the user to then repeat the words into the microphone whilst the training is taken off pause.

Training in Windows Speech Recognition takes rather longer. There is certainly no more text to dictate into the microphone, though there are many pages, each text heavy, that describe what speech recognition is whilst giving some practical training advice. At the end of each page you are asked to dictate a short sentence before the page automatically advances. This process tends to take around 30 minutes for most.

How does it work?

Both DNS and WSR have dozens, of commands imbedded into the software to do automated tasks, such as opening up a program, pressing buttons on the screen, scrolling through windows and even moving the cursor on the screen.

When dictating into the microphone, you will notice a pause of a couple of seconds after you finish speaking before anything happens. This is the program taking time to examine what it has heard. First of all, it needs to listen to the sound and then match them to words. Next it looks at the words that you have spoken and see if they match up to a voice command. If what it believes it has heard does match up to a voice command, the program will carry out the command, if it doesn’t recognise the words as a command, it will type it into the program that you have open (this works for any program that you can type into).

Both Dragon Naturally Speaking (DNS) and Windows Speech Recognition(WSR) share many common commands, however, there are a few differences and in WSR there is a very useful command called ‘show number’ which I will explain later.

Let’s take an example. There is a command in both programs called ‘select last line’. This command highlights the last line of text in a document in relation to where the text cursor is. If a person speaks these words into the microphone, then after a second or 2, the last line of text should be selected, however, if it miss hears you, and thinks you have said ‘select past mine’, this will be typed into your document. This is less likely to happen with commands and more likely to happen when just dictating text, example, ‘today I went for a walk in the park’ may be miss interpreted as ‘today I went for a talk in the park’. Speech recognition software will never make a spelling mistake, just miss-interpretations that often replace what you said for something that rhymes.

If either program struggles to recognise a spoken word (people’s names and place names are good examples), then using a command called ‘Spell It’ in WSR or just ‘Spell’ in DNS will bring up a box so that you can spell out the word, letter by letter using your voice. Saying ‘OK’ to press the Spell window’s OK button will close the window, insert the word onto the page and train the software to that word at the same time.

Speech recognition speeds are up to 180 words per minute, that’s way faster than a professional touch-typist can manage.

Surfing the Internet couldn’t be easier, using speech recognition. If you see a link, read it out loud into the microphone and the software will click on that link for you.If there is more than one link on the page by the same name, the software will put a number by each and then you say ‘Choose (number)’.

Just how good is it these days?

Many people who have tried voice recognition in the past, say 4 years ago or more, will have been put off using speech recognition for life. This is because speech recognition in its earlier incarnations was very inaccurate and therefore very frustrating. To the point that the time taken editing the mistakes far outweighed the time saved in typing them (even for very slow typists).

I would say to these people ‘try again, you will be surprised’. With a good quality of speech, you can expect around 95% plus accuracy, though some people don’t help themselves when using the software. By this I mean, talking to a computer can feel very strange, let’s face it, you’re talking to plastic and metal and not a flesh and blood human being. I have seen all too many times, instances when people read out the training text in their usual reading out loud voice and then when it comes to dictating text and commands, their voice suddenly takes on a theatrical, over-acting style of speech. These people seem surprised when the output doesn’t match what they have said. Well, is this surprising when the computer goes off how you spoke in training and now you’re putting on a false voice? To these people I say, read from a book into the microphone and see if the results improve, as people tend to revert back to the voice they used in training, try it.

The fact is, if you have a reasonable quality of speech and you don’t divert from your manner of speech from the training, you can ditch the mouse and keyboard if you wished (or need to).

So What’s Best? Dragon Naturally Speaking or Windows Speech Recognition?

Let us forget that DNS costs money and WSR doesn’t (if you use an Apple Mac WSR isn’t an option, sorry).

I have used both regularly and I must say, I can’t tell them apart as far as the accuracy is concerned. I do prefer the training within DNS, but that’s my impatient nature as it takes 10 minutes or so longer to achieve, though this, for many, is a one-off exercise (to improve accuracy, you can repeat the training).

Whenever I demonstrate either package, people gasp in awe of its ability. I call it ‘the Star-Trek effect’ where science fiction meets reality. Both packages perform very well (with the odd slip-up, as computers can be spiteful sometimes and act up when there is an audience).

Personally, I give WSR the gold medal. Why? I’ll tell you. WRS pips DNS to the post due to one simple command – ‘Show Numbers’. I will try to explain why. I am going to take as an example, the manipulation of text in a text document, say Microsoft Word. You can use the same commands in both to select (highlight) a paragraph and then you may want to alter the text. To make the text bold, you would say the command ‘Bold That’, to make the text italicised, you say ‘Italicise That’, to underline the text, ‘Underline That’, get the picture?

With so many commands available, it is easy to forget some, or not have the patience to learn them in the first place (this is natural, I have an allergic reaction to reading manuals). When you use the ‘Show Numbers’ command, the software looks at every possible area on the screen that you may want to select with your mouse (buttons, text boxes etc) and assigns a number to them, by fading in and out highlighting numbers over them. Then by saying the relevant number, the highlighted number changes to a highlighted ‘OK’, then by saying ‘OK’ the area will be selected as if you have clicked on it with your mouse.

So what does this mean? I can’t remember the command to right-justify text (move the text over to the right hand side of the page. In Dragon, I have to look up the command on the on-screen help, whilst in WSP, I know the button to select to the task, so I just say ‘Show Numbers’, look for the number that appears over the button, say the number, followed by saying ‘OK’ and it’s done.

The Verdict

Both Dragon Naturally Speaking and Windows Voice Recognition work very well for most and seem as accurate as each other. The technology has improved immensely over the last 3 to 4 years and for those sceptics out there who were put off using speech recognition from earlier versions, give it a go, you will be surprised. If you have a PC running either Windows Vista or Windows 7, then all you need is a headset.

With WSR, you don’t need to remember half as many commands due to the ‘Show Numbers’ command facility.

