|
|

by C. Roy Payne
We were proud to kick off our tenth year in business in the January issue of The AccuView. In that issue, we informed you that AccuImage is turning ten this year, although our roots date back to 1974. We've come a long way since our early days of selling microfilm and microfiche products and services in the 1970s under the name American Micrographics.
This issue of The AccuView is focused on OCR (optical character recognition) technologies. Like AccuImage, OCR has also come a long way. Some of the earliest references of OCR can be found in 1950s research related to the development of a reading aid for the blind, a joint effort between the U.S. Veterans Administration and electronics company RCA.
Fast forward nearly forty years to the early 1990s. OCR had become a viable technology for businesses needing to extract machine-printed information from business documents and forms. But at processing speeds of 45 seconds per page, its efficiency wasn't up to par with that of data entry. Moreover, in its infancy, OCR was fraught with errors, causing many businesses to forsake automated data capture for tried-and-true manual keying.
Today, OCR is a formidable, extremely accurate technology. Even in the past three years, OCR speed and accuracy has improved to a point where it's exponentially better than it's ever been. The technology can now process full OCR on a page in under half a second. Opportunities for leveraging OCR are prevalent in nearly every industry. Even so, only about 15 percent of businesses that could benefit from OCR are actually taking advantage of it.
There has never been a better time to implement a data capture solution. After digesting the information contained in this issue - how far OCR has come, how one AccuImage client is benefiting from this technology, and a spotlight on one of the OCR products we offer - we hope you'll call us to discuss how you can put OCR to work for you.
As always, I hope you enjoy this issue of The AccuView!
Warm regards,
Roy Payne
roy.payne@accuimagellc.com
|

The following is excerpted from an essay published in 1979 by Eugene Garfield, Ph.D., founding editor of The Scientist, member of the Board of Library Overseers at the University of Pennsylvania, and former researcher on staff at both Columbia and Johns Hopkins Universities. Even as early as the 1950s, Dr. Garfield was able to visualize the opportunities for OCR, based on the development, at the time, of reading machines for the blind. Both nostalgic and progressive, this excerpt provides a glimpse into the very early days of OCR.
When I was a graduate student in the early 1950s, I conceived of a device that would selectively copy text from books, journals or other printed materials. Like so many other researchers, I had spent many hours in libraries, making copious handwritten notes. Photostating was too expensive and the Xerox machine had not yet reached the market. Deciding that something had to be done to alleviate the drudgery of note-taking, I created an imaginary device called the Copywriter. It was a selective copying device, designed to let the user extract a particular line or word of text and have it reproduced instantly on a small "printer."
From 1951-53, I was working on the Johns Hopkins University indexing project. I decided to take an evening adult education course in electronics offered at a Baltimore high school so that I could figure out how to make this device a reality. In the fall of 1953, when I returned to my hometown of New York to study library science, I heard about a novel device built for the Veterans Administration by RCA. It was a reading aid for the blind. Its purpose was not to copy but, in fact, to "read" letters and translate them into bird-like sounds which blind people could learn and understand.
I borrowed one of these experimental readers and hooked it up to a modified Brush laboratory oscillographic recorder. I remember my frustrations in this very amateurish approach. I tried to control the output to the recorder in order to create a series of black and white spots on electrosensitive recording paper. It was difficult to cause the stylus on the recorder to move up and down fast enough to respond to the output from the reader. I had a lot to learn about frequency responses, resolution and dozens of other details about facsimile recording.
Years later, after a prototype had been around for a while, some people expressed interest in using the Copywriter not only to selectively copy information in facsimile form, but to feed that information into a computer. For this application, the device would not merely have to "copy" in "analog" form but also recognize each letter, that is, convert that letter into a "digital" signal. In 1970, I incorporated this optical character recognition (OCR) capability into one proposed version of the unit. It would allow the Copywriter to produce a machine-readable code for each letter, number or symbol scanned. In fact, this work led to a patent on a proofreading typewriter.
From my early contact with reading aids for the blind, I knew that many people were considering the use of OCR for this purpose. Over the years I had been kept informed of such efforts through Eugene F. Murphy, director of the Office of Technology Transfer of the Veterans Administration. While early researchers concerned with reading aids for the blind could never get adequate support for developing OCR techniques, OCR equipment for business and government applications became the "in" thing. Eugene Murphy points out that researchers into reading aids for the blind predicted the use of reading machines in business as early as 1949.
Yet OCR has had a very rocky history. I can well remember how OCR was supposed to solve many of the data entry problems of business data processing. When one considers how much costly manual labor goes into data entry work, it is natural to believe that reading machines have a vast potential. But like the problem of mechanical translation, there is much more to the problem of automatic data entry than meets the eye.
For OCR, many hurdles need to be jumped. Yet the advances made by the manufacturers of reading aids for the blind will have tremendous impact on business and government. We will continue to follow the developments in this exciting field to see if one day we can finally use a combination Copywriter and universal OCR in our own operation. Meanwhile, we salute all of those who worked so hard developing reading aids for the blind.
Source: "Has OCR Finally Arrived?" Eugene Garfield, May 7, 1979
|

LifeWay Christian Resources is one of the world's largest providers of Christian products and services, including Bibles, church literature, books, music, audio and video recordings, church supplies and Internet services. Established in Nashville, Tenn., in 1891, the company owns and operates 122 LifeWay Christian Stores throughout the United States, as well as two of the largest Christian conference centers in the country.
The AP department receives nearly 800,000 invoices annually from approximately 1,700 different vendors, primarily invoices for the products sold by the 122 stores. Approximately 70 percent of invoices are received through EDI (electronic data interchange). The EDI program enables LifeWay's largest vendors to convert their paper invoices to electronic documents and electronically transmit them. The electronic documents are then imported into the host system for matching and check payment.
The remaining 30 percent of invoices are received via regular mail, which are processed through imaging and OCR technologies. A mail room employee opens and sorts the mail and loads the invoices into a Kodak scanner, which sends an electronic image of the documents to OCR for AnyDoc. OCR for AnyDoc searches for keywords and extracts invoice information - including invoice number, invoice date, store number, purchase order number and total amount due - and then populates a download area next to the electronic image of the invoice. When the software recognizes a vendor's PO number, it accesses LifeWay's PO database to populate the electronic invoice with vendor information so all the data doesn't have to be captured from scratch. A verifier proofs the electronic documents to determine whether any data was read incorrectly. Since the most important invoice data is next to the electronic image, a verifier can quickly proof for accuracy.
Following the capture and verification processes, invoice data is populated into LifeWay's core accounting system for payment. Invoice images are archived in EMC Documentum ApplicationXtender, a program that enables document retrieval from any PC.
|

AnyDoc Software recently attended a conference of accounts payable professionals, where we sponsored a guest speaker. The speaker provided comment cards for feedback of his performance and content. On one card was a remark that floored me:
"OCR? That doesn't work."
I was stunned. Was this merely the perception of one individual, or did it reflect the views of a significant portion of the workforce? Minus focus group data or other matrices, it's impossible to tell; however, it certainly was troubling. Troubling because not only has OCR been proven to work, but it works quite well.
Perhaps the perception of that one individual helps illustrate why only a small portion (approximately 15 percent) of businesses that could benefit from a solid data and document capture solution has actually taken advantage of it. In order to bridge that gap, our industry needs to better educate the public about the strength and efficiency of OCR (and its companion, ICR) and how it can revolutionize data workflow. With any luck, people like our friend from the conference will begin to see OCR with different eyes, and those of us in the industry will have a renewed determination to inform businesses of all stripes of the benefits an OCR solution can provide.
Data Capture: Improving the Standard
It is true that OCR wasn't always as robust as it is today. In its infancy, OCR was fraught with errors, causing many businesses to forsake data capture for tried-and-true manual data entry. However, OCR/ICR has grown into a formidable, extremely accurate technology.
Data capture has always been dependent upon convergent factors for its success. A good scanner, a reliable OCR/ICR engine and a high-end processor all work in cohesion to quickly deliver high-quality data capture from paper documents and forms. Each of these elements has improved within the past five years, thereby delivering far better data capture, far quicker than before.
Character Recognition: The Strength of OCR
Of course, a clean, straight image makes data capture much easier, but OCR/ICR processing has also made quite a bit of headway in the past three years or so. OCR and ICR speed and accuracy is exponentially better than it's ever been. The technology now can process full OCR/ICR on a page in under half a second. To put it in perspective, OCR/ICR took about 45 seconds per page when we first began to develop data and document capture technology more than 15 years ago.
The systems running the OCR and ICR continue to become faster, and that obviously affects the results. The speed is useless, however, if the end result is sloppy data. Fortunately, the OCR and ICR engines, have also gotten heartier - and the better the character recognition, the better the output. We know of businesses that consistently experience near 100 percent data accuracy. With these kinds of results in just a few short years, I'm excited to think about the things technology will afford us in the near future!
In recent years, a stable OCR/ICR environment has provided unique opportunities to expand the way the technology benefits business. Previously, standard data capture was dependent upon a defined template that required data fields to remain in the same location from form to form - a technology known as structured document processing.
Within the past few years, a significant expansion of this technology prompted the advent of unstructured (semi-structured) document processing - a method of extracting critical data found on inconsistent locations of the same form type. This technology gives our industry incredible opportunities to address the specific pains felt by the vertical markets they support - such as insurance, mortgages, health care, or the accounts payable department of virtually any organization. Each of these, and countless others, has critical data located on complex forms that standard, template-based forms processing tools do not handle very well.
Keywords, not templates, are used to locate and extract the required data on each unstructured form type, regardless of where it may be found on the form. In fact, our friend from the conference may not realize that OCR-based technology is available to tackle the very problems those in the accounts payable field face each day. Unstructured forms processing can automatically capture the critical invoice data - such as the invoice date, amount due, terms, purchase order number and even detail line items - that businesses need entered into their AP systems.
It's also a very intelligent means of processing. By incorporating "fuzzy" logic into the equation, the technology seeks variances of the keywords found on the form type, such as "P.O. Number" and "PO #" for purchase order number data. Also, the more frequently a form type is processed, the more the technology "learns" where to find the data on the page and subsequently becomes faster in doing so. But the ability to do so, with precision, has happened only in the recent past.
That's not to say that unstructured forms processing itself is that new. Actually, it's been nearly a decade since the technology first emerged - albeit prematurely, perhaps. At that time, the systems supporting the technology were too costly and the results were sketchy at best. Understandably, organizations soon became skeptical that the technology had anything to offer them at all. Perhaps that unfortunate history and similar events prompted the remark we received.
Fortunately, the landscape has changed - and we need to proclaim that loud and clear. Particularly so for unstructured processing, because even some true believers in OCR doubt the technology has merit or staying power. But our experience has proven otherwise - we recently implemented our 41st successful unstructured processing solution. Additionally, we feel the market will gradually veer to unstructured processing solutions - the more businesses realize the pliability of the technology, the more they will seek ways to customize it to their way of doing business.
In Summary
I believe it is important for us to remember just how far our industry has come in just a few short years, and I think it is equally as important that we bring all of this to the table when approaching our prospective clients. We need to understand that many of them may not understand what we do, how well it works or why our solutions will make an enormous impact on their day-to-day workflow and yes, on their bottom line.
I truly believe our industry needs to educate the market on the multiple uses for a forms processing solution: efficiency improvement, cost effectiveness and compliance (think Sarbanes-Oxley, among others). And perhaps we need to expand our audience, as well. Perhaps we should reach out to the small- and medium-sized markets to ensure they too can benefit from a stable forms processing solution the way several organizations in larger markets already do.
Once we do, we'll see a vast improvement over the current 15 percent of qualified businesses taking advantage of a document and data capturing solution. I also believe that once we do, our friend at the recent conference will leave us a very different comment:
"OCR? Can't live without it!"
Source: AnyDoc Software, Inc., July/August 2005
|

Automate data capture and document processing. Erase hidden costs and improve productivity by eliminating manual data entry.
Anywhere your business uses paper documents is a good place for OCR for AnyDoc. Thousands of companies worldwide rely on OCR for AnyDoc to capture and process data from business documents. The software eliminates manual data entry, a process that drains profit and productivity. Because data entry often occurs across an organization, with each employee doing some portion of data entry, the costs can go unnoticed.
But with OCR for AnyDoc in place, the impact will be clear and powerful. OCR for AnyDoc captures data from nearly any document. Once information is extracted, your customized business rules are used to validate and normalize the data prior to human verification. Verified data is then delivered to your backend system, content management system and/or workflow for use in analysis, reporting and retrieval.
You can use OCR for AnyDoc to:
- Expedite documents into your workflow.
- Minimize manual data entry costs.
- Improve data accuracy.
- Eliminate manual sorting.
- Improve customer service.
- Evolve your solution as your business grows.
- Ensure secure, controlled access to sensitive documents.
For more information about OCR for AnyDoc and similar products, contact us.
|

AccuImage, LLC is a systems integrator that empowers their customers with solutions designed to gain the maximum value from their information at every point in the information lifecycle. Founded in 1996 and headquartered in Nashville, Tennessee, AccuImage specializes in the design, installation and support of document and content management systems, forms processing solutions, and electronic workflow systems. The company offers hardware and software from leading companies - AnyDoc Software, Böwe Bell+Howell, Canon, Captaris, Captovation, EMC Documentum, Fujitsu, Hewlett-Packard, IBM, Kodak, Kofax, Panasonic, Plasmon and Verity - as well as consulting, document conversion and professional services.
* Limited time offer. AccuImage may discontinue the offer at any time without prior notice. Offer available only to current and new subscribers to The AccuView. No purchase necessary. Following the two-hour complementary consultation, additional consulting is available at AccuImage's regular professional services rates. Consultation may be conducted in person or over the phone, depending on location. Call for additional details.
|
|
|