.: Latest News :. .:News in Pictures:.




Horoscope Recipes

Weekly SectionMarker



Pakistan's Internet Magazine
Herald




Weather

Dawn Classified

Cowasjee Ayaz Mazdak Review Dawn Magazine Young World Images

Previous Story DAWN - the Internet Edition Next Story



Science.com

June 21, 2003



Talking computers: what makes them tick?



By Ambreen Ahmed


Imagine how easy life would become if you could talk to your computer, and it talked back! Thanks to the latest speech recognition technology soon it won’t be just a dream.

Firstly, the computer would have to capture and understand the words being spoken; and, secondly, the computer would have to generate an appropriate response. For all this to happen, we need fast and effective technology.

The first phase is called “intelligent speech recognition” while the second one is called “intelligent speech synthesis”. Out of the two, intelligent speech recognition is the real revolution that will forever change how people interact with their computers.

What we need to achieve this goal is an automatic speech recognition (ASR) system that not only captures spoken words, but also distinguishes between word groupings and proper sentences. This system contains a number of tech components that work together. For example, an ASR system contains an input device (microphone), an intelligent software to distinguish words, and an adaptive database containing words to match your speech. The ASR pulls off these complex feats through the following steps:

1. Feature analysis: Feature analysis captures every spoken word, eliminates any background noise, and finally converts the digital signals of your speech into phonemes. A phoneme is simply the smallest unit of speech, something most people equate with syllables. For example, the ASR system distinguishes the word “tonight” into two phonemes: “të” and “nit”. After this break up, the phonemes are passed on to the next phase.

2. Pattern classification: In this phase, the ASR system attempts to recognize the phonemes by locating a matching phoneme sequence amongst the words stored in an acoustic model database. The acoustic model database is essentially the ASR system’s effective vocabulary. By using this database, the ASR system is attempting to determine if it recognizes the spoken words. The closest matches to your words are then sent to the next step.

3. Language processing: Here the ASR system attempts to make sense of the spoken words by comparing the possible word phonemes (generated in step 2) with a language model database. The language model database includes grammatical rules, task-specific words, phrases, and sentences you might frequently use. If a match is found, what you said is finally stored in digital form.

This step of language processing is the most complicated one, because in this the ASR must attempt to determine your exact words.

While doing so, the ASR system must perform a number of tasks, including the evaluation of the inflections of your voice.

Types of ASR systems

There are four different types of ASR systems available today.

1. Discrete ASR systems require you to pause between each spoken word. This may seem a bit cumbersome, but even with all these pauses, speaking to the computer still remains faster than typing. Fortunately most people seem to adapt to this method rather quickly.

2. Continuous ASR systems can process continuous streams of words — normal speech patterns. The discrete ASR system is prevalent. Continuous ASR systems have a long way to go before they can effectively distinguish individual words in rapid, continuous speech.

3. Speaker-independent ASR systems can be used by anyone but their vocabularies are often limited; some even lack expansion capabilities. For example, a number of speaker independent ASR systems work in conjunction with personal productivity software such as word processing applications. These allow you to speak, rather than type or point at, certain commands (such as file, save, print and so on). However you can’t actually use these systems for actual text insertion.

4. Finally, there is the speaker-dependent ASR system that lets you train it to recognize your voice. You train these systems by reading a lengthy text, such as a Mark Twain novel, into a microphone. As you read, the system begins to recognize your voice and build its vocabulary. However, a speaker-dependent system recognizes only the speech of a person who trained it.

Ultimately, everyone would like ASR systems to be perfected. That is, the best ASR system would allow normal speech (continuous), expand its vocabulary (speaker-dependent), and allow multiple users (speaker-independent). Such a system just might see the light of the day in the near future.

Benefits

ASR systems have the capability of becoming standard technology on home computers within the next few years. Maybe, you will be sitting in front of your term paper and saying it out loud rather than just typing it. But that is only a small glimpse of the real potential of ASR systems. Imagine driving your car and adjusting the temperature of your AC by simply saying, “make it hotter,” or watching TV and saying “ESPN” to switch the channel. This will become a reality rather soon.

Not to be outdone, businesses are seeking innovative ASR implementations to gain advantage in the marketplace. Telephone service providers are already offering voice dialing to their customers, by simply saying “dad” or “home” you will automatically dial the number from a list of predefined numbers.Voice-controlled refrigerators, ovens, dishwashers, washing machines and dryers will become commonplace. For a voice-controlled oven, for example, all you would have to say is “prime rib, 8 pounds,” and the oven will automatically set the temperature and notify you when dinner is ready.

The future

ASR is an emerging technology because it has a long way to go before it becomes a foolproof standard business application. To achieve that stage the following conditions have to be met:

Greater storage volume: Sounds, even when digitized, require a lot more storage space than a single word in text format. If you need an ASR system with a large vocabulary, you will need a lot more storage space.

Better feature analysis to support continuous speech: The most notable drawback of the continuous ASR systems is their limited ability to distinguish between quickly spoken words. One of the problems is that we tend to drop consonants when we speak, making it difficult for the ASR system to determine where one word ends and another begins. This process is handled by the feature analysis phase (step 1), which must become more intelligent as natural speech does not have abnormal pauses between words.

Better dynamic language models to support speech understanding: Speech recognition is great, but true speech understanding would be better. For this to happen, language models that understand words in context must become more dynamic in understanding your words not only within the context of a sentence, but also in the contexts of paragraphs and conversations.

A better flexible pattern classification to support multiple users: For ASR to become truly viable in the workplace, a given system must be usable by anyone and everyone. With the exception of speaker-independent systems, most ASR systems lack this quality. The proliferation of ASR systems that can interpret the speech of anyone - even those suffering from a cold or speaking in a dialect - will define the true success of automatic speech recognition in businesses.

The writer regularly contributes IT-based articles to Sciencedotcom



Click to learn more...
Please Visit our Sponsor (Ads open in separate window)

Previous Story Top of Page Next Story

Seprater
Contributions
Privacy Policy
© DAWN Group of Newspapers, 2005