|Issue:||Europe I 2009|
|Topic:||Mobile phones – bridging the divide between text and speech|
Paul Ayres is the CEO of Textic Limited; he has worked in the technology sector for the past twenty years, specialising in start-up operations. Mr Ayres’ background is in sales and marketing. Prior to Textic, Mr Ayres served as President and CEO of Authoriszor Inc, an Internet security vendor. In addition Mr Ayres ran dyslexia-focused start up, BrightStar, in the UK. Previously, Mr Ayres worked to establish Real Networks in Europe and earlier as General Manager and Managing Director of Netscape Communications in Northern Europe.
New technologies are emerging that efficiently, accurately and naturally convert speech messages into text and vice versa. Mobile phones will soon playback and SMS as voice or a voice message as text and messages in one language will be retrievable in another. These features will allow safe voice playback of email while in a car, silent messages while in a meeting and facilitate access for those with speech, sight and hearing disabilities and illiterate populations – including small children.
The evolution of mobile technology is something we often take for granted. Think back to the glorified bricks that constituted the first ‘mobile’ phones – they were bulky, slow and limited to the most basic of functions and might well have been more useful on building sites. Mobiles now slip into pockets practically unnoticed, with vast memory, incredibly fast processors, and more capacity than most desktop computers had a few years ago. Telephone lines, originally only for calling, have been progressively exploited for transmitting textual information via telex and fax, and now provide access to the world’s greatest network, the Internet, until recently a predominantly text-based medium. Mobile phones and 3G networks are now so advanced they have the capability of delivering much more. They send and receive data constantly over high-speed connections, and allow users to download a myriad of applications and updates as with desktop or laptop computers. We are now seeing the emergence of virtual network-delivered services, on these personal communication devices. Still, how often do we think about what our mobiles can do compared to what they should be able to do? Consider the spoken and the written word: our two principle forms of communication have developed alongside one another across societies, each relevant for different situations. Speech and text communications have progressed in parallel in the technology world, through phones and computers and, more recently, as functions on a single handset. Take the phone call, SMS and email functions which are included together on many mobiles. We use these features independently of each other; you choose one or the other to either call or type a message, and your choice is imposed on the person at the receiving end. What if the mobile could integrate these features by exploiting speech recognition and text-to-speech technologies? The underlying technologies in question are advanced enough to use, but their implementation so far has been tentative; with ongoing improvements their application will become increasingly widespread and will significantly augment the functionality of the mobile phone. Specifically, applications will appear that bring the two communication mechanisms together, giving us more freedom in the way that we interact with our mobile phones and bridging barriers that currently separate text and speech. Futuristic visions of a mobile phone effectively capable of interacting with people by understanding, converting and using both text and speech are already becoming a reality. Today’s speech recognition technology is commonly used for customer services helplines and is capable of responding to simple verbal commands from callers. Other companies have created mobile applications that allow us to use the phone verbally, for search functions and other features such as calling and sending messages. The big players in this field are constantly driving the technology forward, so we can expect to see rapid development in the mobile sphere in the near future. Text-to-speech technology, i.e. the conversion of text into speech, has also advanced rapidly since the days of robotic computerized voices – the quality of speech produced today is astounding. A human voice can be recorded, broken down into countless parts, and re-organized using complex algorithms to read any passage of text on-the-fly – the result is an incredibly clear and lifelike voice, which takes into account grammar, punctuation, and context to create speech that sounds incredibly natural. The mobile phone, with a variety of speech and text functions (calling, voicemail, SMS, email), is an ideal place for these two technologies to work together and provide the type of features that will exponentially increase its benefits. Some examples: Hands-free – In certain situations, speech recognition and text-to-speech technology will give users freedom to choose the format in which they receive information, currently dictated by the person sending it. This will be far more convenient, safer and legal if you are driving a vehicle, for example, and you want to hear a text message or email when it arrives without taking your attention from the road. Keeping quiet – In a meeting, when it can be inappropriate to take a call or even listen to a voicemail, speech recognition technology allows displaying voicemail as text. Personal choice – Some people will choose to read their voicemails as text for speed’s sake. Those who find small mobile phone keypads difficult will prefer the speed and ease with which they can call someone or send a message. Accessibility – Speech recognition and text-to-speech technology provide much wider access to mobile technology for disadvantaged visually or motor-impaired users. Text-to-speech services also help people with dyslexia or low literacy levels – by using network-delivered technology to turn text into voice, they will no longer have to struggle if they find text hard to read. Today’s speech technology, like any other, is not without its hiccups. Voice recognition is clearly difficult, since our voices vary greatly in tone and accent, for example. Voice recognition must also deal with occasional interference from background noise or other voices. In some cases, applications can be ‘trained’ to recognize a specific voice and improve the accuracy of recognition. Because of these difficulties however, voice recognition applications can build in safeguards against making mistakes, for example, the application might ask, “Did you say…?” to confirm commands. An awareness of the technology’s limitations means that good service can be built-in. Occasionally, text-to-speech technology does not pronounce words perfectly. This is more a demonstration of the illogical development of language than of the technology’s own failings, and almost never means the word was not understood. Most of these can be corrected easily, but some words provide more hurdles than others. Words such as ‘read’ can be pronounced either like ‘red’ or like ‘reed’, requiring the application to make an educated evaluation of the context. Text-to-speech applications are constantly evolving to give problematic words and phrases greater contextual accuracy. To get some idea of just how good mobile speech functions can be, just go online and visit a speech-enabled website, or call an automated hotline that uses speech recognition – you will certainly be able to envisage their practical applications and imagine the technology’s potential. With the recent development of spoken online translation – where servers can take web text written in English and instantly convert it into speech in a foreign language – it is not unrealistic to imagine that the mobile of the near future could play back an English text as spoken French. Where does all this leave us now? Limitations are being addressed and technology is getting better. Mobile phone applications have stretched far beyond their initial stages and are becoming really ‘smart’, increasingly ‘intuitive’, increasingly inclusive and – most importantly – increasingly useful. While some of the technology described above is already available, the next generation of speech technology is already being tested. We will soon see extensive usage of these speech and text conversion technologies by a larger number of carriers, using a wide variety of handsets, to deliver information via the latest networks and servers. This time next year we expect to see these services much improved, and in action on a much wider basis.