It's been
a long time coming, but now your computer can actually take a letter
from you
by Alan Zisman (c) 1995 First published
in Business in
Vancouver , Issue #295 June 20, 1995 High Tech
Office
column
Do you
remember HAL,
the computer which turned homicidal in the film 2001--A Space
Odyssey?
You didn't see the astronauts in that film typing commands to HAL.
No mouse, either. Instead, they talked, and HAL listened and (for
much of the film) obeyed.
But that's
just science
fiction, right? Not necessarily. Computers have had the potential
to listen and recognize voices for some time now. Over a decade ago,
the puny personal computers of the day could be harnessed to respond
to verbal commands, and connected to robotic devices to carry out
simple tasks. And as computers have become more powerful and sound
capabilities more common, voice recognition has become more widespread,
though by no means commonplace. SoundBlaster cards, for example,
include
a utility to allow users to talk back to their computer, and multimedia
Macs include software allowing voice commands.
(A
particularly nice
implementation, LISTEN for Windows, Verbex Voice Systems Inc., 1090
King Georges Post Rd., Bldg. 107, Edison, NJ 08837; tel. 800-275-8729)
offers a program for $139 that allows the user to start programs by
voice, and comes complete with voice commands that work in a wide
range of popular Windows software.)
But when
users hear
about voice recognition, their eyes light up with visions of telling
their computer to "take a letter." And that's simply not possible
with any of these programs.
Enter IBM.
It
has been quietly working on voice recognition for a number of years
now, and late in 1994, released the fruits of its labour. Originally
released for OS/2 only as the IBM Personal Dictation System, it is
now available for either OS/2 or Windows as VoiceType Dictation.
Amazingly,
it works
pretty much as advertised. It comes with a 22,000-word vocabulary
in standard (U.S.) English (UK English and several other languages
are already available for OS/2 and will soon be made available for
Windows). Users can add up to 2,000 customized words, and specialized
dictionaries are available for law, emergency medicine, radiology,
and journalism (complete with umpteen American politicians' names).
It claims
to be able
to keep up with dictation at a pace of 70 words per minute, somewhat
faster than most typists. And while it may make more mistakes than
a good typist, it generally provides the correct version of homonyms
like there/their/they're. (It can do this because it doesn't look at
single words, but rather at patterns of three words at a time). And it
learns as it goes along, so with time it tends to make fewer mistakes.
Of course,
it's not
quite that simple: for starters, this is one huge program--16 floppy
disks--and IBM recommends you have at least 62 megs of hard-drive
space free before you begin installation. And sorry, you can't use
a compressed drive for that.
You'll
need at least
12 megs of RAM, and 16 are recommended. You'll also have to increase
your Windows swap file (virtual memory) to at least 14 megs, using
up still more hard-drive space. You can reclaim some of that drive
space after you've successfully installed the program and used its
enrolment feature--training it to recognize your way of speaking.
And it's
not just software.
You have to crack open your computer's case and install a card. This
is available in an ISA version for standard PCs, or a Micro-Channel
version for IBM PS/2s, or as a PC Card (PCMCIA card) for portables.
As with most such add-in cards for PCs, you may have to fiddle with
technical obscurities such as IRQ and I/O addresses (especially if
you're using the ISA version with Windows: OS/2, Micro-Channel, and
Win95 all help protect the user from this sort of thing).
Once
you've installed
the card and the 16 disks, be prepared to spend a lot of time with
it. It's not particularly complex software to use (despite the 400-page
manual), but you need to train it to recognize your voice, and it
needs to train you to speak in a manner that it understands. It doesn't
recognize a continuous flow of words--normal conversational speech--so
it wouldn't know where one word stops and another starts. Instead,
you need to learn to speak accurately, and with tiny pauses between
each word, sort ... of ... like ... this ... period. (Yes, you do need
to tell it about punctuation, unusual capitalization, and so forth.)
When you
dictate, your
text appears in a window. To get it into the word processor of your
choice, you need to copy into the clipboard and paste into the other
program.
(An add-on
from a local
company, however, promises to make this simpler: Digital Dictate,
from Vancouver's Barr Business Systems, works together with
VoiceType Dictation, allowing users to dictate directly into Microsoft
Word. WordPerfect will soon be supported as well. More information
on this US$295 product is available from Barr Business Systems, 291
East 2nd Avenue, Vancouver V5T 1B8, tel. 872-2277.)
Plan on
taking at least
two hours to work through the training scripts for VoiceType Dictation.
After that, the machine needs to be left running on its own for at
least two hours to process the results of the training. Oh, one more
thing: the package--card, software, and microphone--costs about $1,500.
So this is
not a package
to pick up on a whim. But it promises some real benefits for people
in a wide variety of areas, including users who simply can't or won't
type, or people who haven't been able to use computers in situations
where they simply need to be using their hands for something else.
While it demands a lot of computer (and financial) resources, VoiceType
Dictation makes good on its promises to a surprising degree.