Giving Technology A Voice (part 3) – The Power of Speech
Speech recognition was initially designed for people who felt they needed to do lots of typing but couldn’t touch-type, otherwise known as the one-finger typing brigade or ‘hen-peckers’. However, it soon became apparent that there were disabled people out there, who couldn’t type by hand or struggled with literacy, who were suddenly thrown a computer life-line.
Nowadays, to help disabled users, we tend to suggest speech recognition software for 2 main reasons:
- Physical disability – where there is a physical disability that prevents a person from working a keyboard and mouse effectively. In this scenario, a person will not only need speech recognition to type for them, but also to aid them to control the computer, to aid both mouse and keyboard functionality.
- Learning disability – where a person has difficulties with literacy. This may be down to dyslexia, through a reduced cognitive level or a difficulty with comprehending how you’d work a computer.
The above 2 disability types have a wide range a severities and effects, so for some, speech recognition may not be the answer, or may be just part of an overall solution.
After just saying how complicated and intricate disabilities and their effects can be, I’m going to scoop them up into 2 big camps and generalise the situation! Only by generalising can I offer an opinion on the products out there and if their suitability leans towards one camp, the other or indeed neither.
For a more in-depth discussion on how speech recognition works in general, on the various platforms, take a look at my previous post – Giving Technology A Voice (part 2).
In this post I will be comparing the platforms (operating systems) against both physical disabilities, those who struggle to access a computer due to strength or control loss in upper-body limbs (fine and/or gross motor impairments) and cognitive disabilities, people who have a difficulty deciphering information involved in computer tasks (such as reading, writing and/or comprehension).
Moving the goal posts
Operating systems are constantly updated. With updates, often come changes. Some changes are minor and some are major. This can affect the support for speech as well and this is why I’ve stated the version of the operating systems as well, so that the information is correct as of that version.
General speech access
Android can be found on various systems, such as the Nexus, Samsung Galaxy, Hudl. Some of these systems will not openly support the updating the system (as it may break some of their custom made add-ons) and some will. Speech recognition on Android 4.4.4 is probably the best of the bunch when it comes to website integration. This is undoubtedly due to Android being written by Google and so it works brilliantly with Chrome, Google’s web browser. They’ve even written their own App to help with the integration of speech, simply called ‘Google’.
There is mention in the settings of ‘Offline speech recognition’, though this doesn’t work. As I mentioned in my last post, if you’re off-line (no WIFI or alternative data connection), then speech recognition just won’t work.
Sending an email is very affective. Just say “send email to” and then the person’s name. It will then prompt you for the message. It will then ask you if you want to send it. Saying “yes” will do this, “no” will cancel the process.
The issue here is that you have to activate the speech recognition by pressing the microphone symbol. Kind of difficult if you can’t physically press the screen and those who can, well they need good precision as the microphone icon is small.
There is an option to ‘wake up’ the speech recognition, though this is only available when you’re already in the Google App, which doesn’t automatically run at startup and as soon as you open up another App, well, you lose that function.
There are currently inactive options to allow speech recognition from any screen, but this is currently unavailable, but the fact that it is mentioned would suggest that the next update (Lollipop) will allow this. Google are in fact going to activate this in their new Google Nexus 6 and 9 versions. The reason that this is currently unsupported is that having speech recognition always active is a real battery-killer. The new Nexus models (out this month), will have a dedicated, low-power consuming processor, just for speech recognition.
You can perform such things as asking it the weather, how old certain celebrities are, what the population is of countries and so on. The answers are generated by Wikipedia, the free online encyclopaedia and are spoken using Google’s Text-To-Speech voice. However, if you are asking a relatively obscure question, then web search results are given. That’s fine, apart from the inability to follow the link by voice.
The speech recognition could be an invaluable tool here, as long as the person understands what to press and what sort of things to say.
If the TalkBack screen reader App is activated, then those who have poor or no literacy skills could use this built-in App, designed for blind users, to have words on the screen read out loud.
TalkBack allows you to touch the screen and instead of acting on the press as usual, it will read out the text underneath your touch. The problem with this is where the text underneath is a link. In this case, if you want to activate the link, then you need to double-tap instead. If you want to scroll up or down, then you need to drag two fingers on the screen instead of the usual one finger.
This solution I’ve just mentioned would be fine for some with a relatively mild learning difficulty, an option that suits some, but not all. This technique of screen reading is adopted by all of the operating system’s screen readers, in order to get around the touch screen interface.
For those who need a reminder for an event (maybe to take medication or someone coming round to visit), reminders can be set by voice when Android 5, Lollipop comes out.
iOS’s speech recognition is called Siri. Siri came along back in iOS 6 and is supported by the iPhone 4S or higher and iPad 3 and higher models.
I’ve found Siri to be currently less accurate than Android’s speech recognition.
Once again, a physical action needs to be implemented before Siri is activated. This is either the pressing and holding of the Home button for a couple of seconds, or by pressing the microphone icon from the on-screen keyboard.
As with Android, you can set search the Internet, call people on your phone and a handful of other tasks.
Again, we have the options of both speech recognition and screen reading (Siri and Voice Over). Siri as mentioned above, is only OK in my view. It can do various tasks, but lacks in areas. For example, if you want to send an email to someone, you can say “send an email to ….”, then when it prompts you for the message to be sent, you get only one attempt. You can’t add to the message and as with all but the Windows Professional option, you can’t edit the message in any way once it’s been written.
iOS 8 now has the ability to wake up and respond to voice, using a voice command, but only when the tablet is plugged in. This is because of the drain on the battery as it constantly needs to listen for a wake-up command.
One of the least used operating systems, with only between 2-4% of phones running the operating system is Windows Phone. Most important thing here is not to confuse Windows Phone and Windows Pro, or indeed Windows RT. Windows Phone is found on phones and have different rules that it’s Pro big brother.
Meet Cortana. She’s the speech recognition interface on Windows Phone. Microsoft have kept the name of the built-in screen reader from their Windows Pro/RT in the guise of Narrator.
Cortana pretty well keeps up with her competitors. The accuracy of the speech recognition I would say lies between the other mobile operating systems. Better than Siri on iOS yet not quite as good as Android’s speech recognition on the Nexus.
Yet again, we have no means of activating speech recognition without pressing something. Yes, the microphone icon!
Many common tasks can be initiated and handled by Cortana, such as messaging, emails and web searches. You don’t tend to get depth of spoken search results as you do with Google’s Android, but you can add to messages in texts and emails, allowing the peace of mind that you don’t have to dictate the message in one take.
Cortana works pretty easily. No easier than Google’s Android, though with iOS’s hit and miss results, less confusing than Siri and therefore more usable for those with learning difficulties. Narrator is OK, nothing special and works pretty well the same as the other mobile operating system’s screen readers.
This is the leader of them all when versatility is the measuring stick and it really should be as it is the full version of Windows. No chopped down version here, with well over 20 years of history behind it. Because of this, people have the opportunity to use all manner of assistive technology plugged into it. However, we are comparing speech technology here and so we will just target this area.
The built-in Windows Speech Recognition (WSR) has a number of advantages over its counterparts (iOS, Android and Windows Phone). These are:
- The speech files are all stored on the device. This way, there is no WIFI or other data connection needed for it to work.
- This is currently the only speech engine that you can train to your voice. So, such linguistic differences such as regional accents can be catered for and add to the overall accuracy of the speech recognition.
- WSR lets you control almost all computer tasks and navigation.
There is, however, one rather big disadvantage with WSR and that is the quality of many of the built-in microphones in many of the Windows tablets. They tend to be awful. Either that, or the speech recognition developed in the other operating systems are far superior when it comes to speech detection. Either way, it has the result of people needing to wear headsets to get a decent level of accuracy from WSR.
We have 3 tablets at work and a few laptops. All but one tablet and all of the laptop’s built-in microphones are pathetically bad.
This offers almost complete autonomy over the computer. Since you can set WSR to start-up when the computer starts, the only issue is first turning it on (so, maybe just don’t turn it off) but being tethered to a headset because the microphone is awful can be restricting.
Another issue to bear is that because WSR is so ‘global’ (it does everything), it has more chances of getting commands wrong. I would say that if you want a mobile solution, you’ll get frustrated with WSR. Leave it for the home environment where you may want to use it for lengthy emails or more complicated asks. Unless it’s a cold day and then the ear pieces from a headset can help keep your ears nice and toasty warm.
I must say that the added complexity of WSR does take it to the bottom of the pile for many with learning difficulties. Narrator works equally as well as the other operating systems, but writing and launching tasks with WSR can be more of a challenge. I think 4th place here, especially with the lack of spoken answers to web searches and the instability of the Windows operating system compared to the its’ mobile counterparts, to help those with reading difficulties. You’ll always need to turn to Narrator for help there.
If I were able to activate the speech recognition by hand, then I would say the Google Nexus would win by a country mile. It’s just sleek, responsive, scarily accurate and answers so many questions with its Text-To-Speech voice. It often makes it a nice tool for those who struggle with reading. Android 5 (Lollypop) could improve its performance even more…
If, however, I couldn’t activate the speech recognition by hand, the Windows Speech Recognition is the only option. It is good for severe physical disabilities, but you do need to wear a headset to get it working well. You may want to road test a range of tablets to find a decent built-in microphone to get around this problem. The added bonus here though, is that you can do so much with the speech recognition. Apart from arcade games, you can pretty much do anything.
So, here’s how I’ve placed our contestants:
- Android (I’d use the Google Nexus for best results and waiting for the new version 6 and 9 may allow you to take advantage of increased speech recognition power)
- Windows Speech Recognition (use a headset for best results. You really need to)
- Windows Phone (It’s getting better and you can add to your messages, rather than having just one shot as with Android and iOS).
- iOS (Siri is seriously glitchy. It’s as smooth as muesli and doesn’t live up to its’ other accessability options, such as its’ awesome switch support).
Trackback from your site.