Paperless Office> Not Quite: Wordscan 3.0

Paperless Office> Not Quite: Wordscan 3.0
by Alan Zisman (c) 1995. First published in Our Computer Player, May 1995
Calera Recognition Systems
475 Potrero Avenue
Sunnyvale, CA 940806 USA
1-800-422-5372, 408-720-8300
408-720-1330 (fax)
list $595 (US)

Promises, promises.

One of many promises made to us by the proponents of the brave
new world of digital technology has been 'the paperless
office'. E-mail replaces memos. CD-ROMs replace reference
books. Hypertext publishing on the Internet's World Wide Web
as the next frontier.

More efficient, and environmentally friendly. Why think of
all the trees that won't need to be cut down for paper.

But what about all the existing paper documents? Can they be
made digital, too? How can they be integrated into the future
electronic office?

Well, add to your computer just another gadget, a digital
scanner, and you're part way to an answer.

When you scan a page, a picture of it appears on screen.
Surely that's pretty useful.

Well, if I'm trying to get a graphic, a scanned page can be
pretty good. I can import it into a paint program or a photo
enhancement program, clean it up, and have it ready for
publication.

But if the image contains text, it's less useful. Sure I can
store a picture of the text. But it may be hard to read, and
it's impossible to edit in my owrd processor. I can dump it
into a page-layout program as a graphic, but I can't change
the fonts or styles, and can't quote just part of it very
efficiently.

And the picture of a page takes up a lot of room when I save
it as a file.

Here's where Optical Character Recognition tries to come to
the rescue. OCR for short, it looks at your scanned image
and tries to read the text on it. It then lets you proofread
the page, and save the result in text format.

As text, the document can be opened in the word processor or
page layout program of your choice, and manipulated as words,
rather than as a picture of words. And stored as a file, a
page of text will only take up a few kilobytes of space.

Calera's Word Scan Plus, now in version 3.0, is one of
several programs aimed at the top of the OCR market (a
version with fewer features is bundled with several brands of
scanners, or is available separately for $249 list).

It lets you choose your target word processor from the
Windows 'big 3', Word, Word Perfect, and Ami Pro, and will,
by default, save your text in the file format of your chosen
program. But it goes a few steps further. You can also have
Word Scan's toolbars mimic the style of your favorite word
processor. And you can even add it to your word
processor's menus making Word Scan just a click away.

You can use it with scanned images, and it will automatically
start up and run your TWAIN compliant scanner. Or you can use
it to work on previously stored bitmap graphic files, or with
faxes received with your fax-modem.

As it analyzes the page, it breaks the page up into zones,
identifying multiple columns, sidebars, headers and footers,
and captions. You can choose to ignore some zones-- maybe you
don't need to include "Page 47" in your text output.

Then it proceeds to try to read the text, employing
artificial intelligence proceedures to try to minimize
errors. It shows what it considers questionable text, for
your correction. Finally, you're presented with the full page
of text, for manual proofreading. When you're satisfied, you
can save it in your word processor's native format, or as
plain text. When you save it, it tries to maintain the look
and feel of the original page's layout.

The program supports OLE 2, and with a program such as
Word 6.0, lets you drag the image from Word Scan's preview
window and drop it right into Word. If you do this, it will
automatically read the text, inserting it into your Word
document.

While Word Scan is fast, it's not always very accurate. And
its first editing window is less helpful than it could be, by
showing doubtful text out of context... in many cases, you need to go back
to the original to make sense of what is being shown.

Of course, accuracy is dependent on the quality of the
original scan. Scans made with the popular and inexpensive
hand scanners will be more difficult to make sense of then if
they are scanned on a pricier flatbed scanner. And newspaper
text often is somewhat blurry-- is that an "m" or an "rn"?
That's where the artificial intelligence algorithms can help.

When the time correcting the scan is taken into account, OCR ends up
faster than retyping from scratch-- but not always by as much
as you'd have imagined. And if you're trying to digitize
important information that you can't spell-check (financial
records, for example), you should be prepared to proof-read
VERY carefully.

By the way, if you have a good fax-modem program, you may
already have all the OCR software you need... WinFax Pro, for
example, includes the OCR engine from Word Scan's biggest
competitor, Caere Omni Page. And while the WinFax manual
doesn't mention it, you can use it on saved pictures and
scanned pages, with good accuracy.

Or if you have a fax modem, you can fax documents to
yourself, if you don't have a scanner. Just be sure to set
your sending machine to 'fine' mode (200 x 200 dpi) for the
best results.

Some big businesses, such as VanCity Credit Union, are
busy converting all their old records to digital form. And
the volunteers of Project Guttenberg are hard at work in
their spare time scanning classics of literature and
converting them to freely available digital text. Still, we'
re producing as many paper documents as ever; don't sell off
your shares in forest-industry companies just yet.

Alan Zisman is a Vancouver educator, writer, and computer specialist. He can be reached at E-mail Alan