by Tom Beasley

Information is undoubtedly growing at exponential rates, and as a result, it's becoming more and more crucial to eradicate information anarchy throughout the organization. This can be done not by limiting or reducing the volume of data, but by ensuring that all content is accessible, usable and in context. In fact, when quality reigns over quantity, the volume of data quickly becomes irrelevant. Content management systems help organize information from under one roof - data is housed in one repository and can be easily and quickly accessed throughout the enterprise.

This issue of The AccuView addresses the importance of data quality. Having data is one thing … making it useful is another. Infusing your content with high levels of quality and organization is the key to content management success.

Sound like a daunting task? It doesn't need to be. Our professional services team is standing by to assist you with your information management projects. Our business analysts and technical consultants can work with you to discuss, design, develop and deploy a solution specific to your business needs. I invite you to contact us at 615.242.7226 for a consultation today.

Best regards,
Tom Beasley
tom.beasley@accuimagellc.com


The typical enterprise has an incredibly rich store of information, much of which is in digital form. Cataloging content, sharing multiple versions and copies of it, and repurposing it to make new content should be easy. But knowledge workers today spend 40 percent of their time just searching for information - often without success. The results are inefficiencies and poor decision-making due to the unavailability or inaccessibility of good information as well as valuable content that is underused or that must be recreated.

Organizations need a system to gain control of information and put the power of content management at the fingertips of knowledge workers so they can work faster and smarter, alone or in teams, and for the benefit of your top and bottom line. They need a solution to provide knowledge workers with access to all the information they are entitled to see - not a subset of information as defined by a relational database, siloed application, or desktop productivity tool.

The system should provide a way to quickly determine which content is most contextually relevant to the task at hand and which can be filtered out or put in perspective. And the more valuable the content is to your business - as defined by its strategic relevance, timeliness or other yardstick - the more important it is to make this content immediately available.

When content matters, enterprise content management is the solution. That's why organizations are investing in enterprise content management systems to:

  • Sift through content to get it under control and determine its relevance.
  • Make content accessible through searches and from within collaborative processes.
  • Deliver content to end users in a way that increases productivity.

Enterprise Content Management in the Real World

Your organization likely has invested heavily to create and store content. The ability to access content easily and intelligently can translate to economic benefits such as getting your products to market faster, improving customer retention and lowering your cost of doing business.

As part of managing content electronically, the content management system is able to grasp the meaning within documents, paragraphs, sentences, scanned and digitized copies, and visual images, classifying each element according to business rules so as to analyze its value to the organization. As a result, a content management system can automatically make the right information available to users, in the right format, in the right place at the right time.

A comprehensive enterprise content management system analyzes content through a series of classification engines with no human intervention required - other than establishing the powerful and flexible ground rules which the engines use to classify content as it enters the system. The outcome of this process is that content is automatically tagged and categorized, improving its ability to be searched, retrieved, delivered and repurposed.

A similar sequence of procedures can be applied to the recognition of non-verbal content such as videos, reproductions or scanned images such as a driver's license that is part of an accident file, or files created in specialized applications such as a graphic layout program. This content can then be associated with related content so that a user will see all relevant information residing in the system - with nothing left out simply because it has an external or unrecognizable format. And, as long as all content is housed in a common repository, any of it can be served up across the enterprise whether to a client interface or as infrastructure integrated with other applications.

One of the great benefits of an enterprise-class content management system is that it is able to expand its functionality and contents in huge increments over time. Instead of being overwhelmed by the volume of new information (the typical scenario in most businesses today), a content management system helps the enterprise become smarter and more agile as it grows.

[Source: EMC]


Market pressures are forcing companies to invest millions of dollars in business intelligence (BI) architectures, enterprise content management (ECM) systems and decision reporting tools. But often companies ignore the component that is most vital for success: clear, consistent and reliable data. Data is the building blocks of information. If the data in the operation systems and data warehouses is not complete, accurate and consistent, attempts to benefit from the organization's information are going to fail and can even cause damage.

Consider a data warehouse belonging to a bank, importing data from various operation systems, and that contains a duplicate record of the same customer. One record lists a negative balance of $-1,000 for the customer, while another record shows that the customer also has a positive balance of $1.5M. The branch manager who accesses the first record but not the second one can involuntarily cause the loss of this customer. This is a case of record duplication and the inability to recognize the fact that this is the same customer.

Decisions Based on Deficient Data

Without high-quality data, decision makers must guess what they should know. Worse, they are liable to make decisions that appear to be informed but are in fact based on deficient data. To work with information and decision support systems that provide reliable and uniform data in real time it is not enough to collect the data and feed it to the BI tools. First, it is important to gain an in-depth understanding of all the aspects of the data and to explore possibilities of improvement before making the data available to a wider group of users. To prevent poor performance of the BI system, it is important to test the quality of the data and verify that it is complete, consistent, up-to-date and accurate.

By strongly emphasizing data quality, organizations can accelerate the development of the architecture of their BI systems; reduce the number of repeat transformations and extractions from the warehouse; report to end users about data quality issues; and eventually increase profits and return on investment in the BI infrastructure.

A data cleansing process of this nature results in a significant reduction in the number of customers and vendors because it eliminates duplicates and outdated records. At times, the reduction in the number of customers, vendors and items reaches 70 percent. Automatic data cleansing tools are used to transfer data from operation systems to the data warehouse in combination with data integration tools, which are intended to transfer the data from the information systems by saving time and costly development resources. An additional benefit of using automated systems is that the monitoring and quality of the data are improved significantly, and decisions are made based on a unified view of each customer as a single entity.

Many organizations are aware of problems having to do with transaction data in their systems, but do not know how to address the problems. By their nature, data migration and conversion processes reveal data quality problems. When data from various sources is integrated and subjected to new business requirements in the data warehouse, it becomes accessible to business users; in these situations, the completeness and reliability of the data gain paramount importance.

Data quality problems can be the result of several factors, including:

  • Data distributed among several platforms and legacy systems.
  • Extensive data redundancy between various application systems.
  • Lack of standards for data within the organization.
  • Deficient metadata or complete lack of metadata for legacy systems.

Maximizing Profits from Data

There are many tools for dealing with data quality by performing various operations such as tracing flaws and improving processes. Tools for handling data quality issues help organizations gain control over their data assets and derive optimal benefits from their BI infrastructure. These tools help maximize the organization's return on its investment in data infrastructure by analyzing organizational data and ensuring that only "clean" and reliable data is populating the warehouses and datamarts; reveal hidden business rules and verify their validity; grant priority to issues of data quality that lead the organization to invest in areas that have the greatest influence; use business rules and data verification to cleanse the data while it is being transferred; monitor and manage data over time to ensure that the active data cleansing programs provide consistent and measurable advantages. By understanding the properties, advantages and deficiencies of the original data, the organization can prevent surprises, set expectations and reduce the need for corrections.

One of the most prevalent myths is that a new system or data warehouse will fix data problems originating with the legacy systems. Although a process of data transfer results in a transformation of the data for improved business approach, the transformation process in itself does not guarantee cleaner data. With the right data cleansing, conversion and loading tools, and with data quality assurance tools, it is possible to maintain operation systems and data warehouses that contain reliable, uniform and useful data that serve the overall business success.

Printing paper and moving paper are both expensive and time-consuming. Companies that need to cut those costs and manage processes more effectively are simply converting documents that were being sent by mail into faxes or scanned document images. From the recipient's standpoint, the incoming documents are all unstructured or semi-structured. Whether image-based or data, the documents might look similar (all invoices), but the data elements do not contain understandable metatags. Therefore, each document must be looked at and interpreted into a common format that is understandable by the IT backend procedures.

Capturing data from images using traditional forms processing is based on knowing the specific form layout so that you can build a template to locate which fields to capture, the rules to use for each field and any cross-field validations. The template also defines the associated output metadata for the fields. Traditional forms processing works well when the layout of forms is the same or where clear identifiers define the format. Examples include tax returns, credit card applications, medical claims, etc.

Capturing data from unstructured, unknown data layouts can use search engines. Those hunt through unstructured text to identify and extract contextually relevant documents and phrases. However, creating understandable metadata for output into business processes requires business-specific rules, which means that the software must understand what the document is.

New intelligent document recognition technologies, originally developed for invoice processing and the electronic mailroom, use techniques from each of the above areas and eliminate the limitations. It is no longer necessary to know what the form layout looks like. It is no longer necessary to insert batch separators. It is no longer necessary to presort. Specific rules can make the data understandable. Intelligent document recognition has the ability to figure out what the document category is and apply the appropriate business rules.

IDR, which is also called intelligent data capture, works a lot more like humans, relying on training and an internal knowledge of the layout and content of generic form types, which is used to understand and extract required information and initiate workflows. That widens the types of forms that can be captured and reduces costs, but IDR also changes capture capabilities substantially into a series of tools that have the ability to interpret and extract data from all sorts of unstructured information.

The information can be input as scanned paper or document formatted information, whether it is data-centric, such as Word or PDF file, or image-based. Typically that includes and leverages multiple different methods including pattern recognition, OCR (optical character recognition), and other recognition and search engines to locate and extract required information before applying business rules to it. IDR capture provides the ability to make sense of and help manage the unstructured, untagged information that is coming into the corporation. It can provide the front-end understanding needed to feed business process management and business intelligence applications, as well as traditional accounting and records management systems.

Capture is evolving into a critical business systems need that improves core business processes and competitiveness through its development of business rules-based document understanding. The capture market can be broken down into four sub-segments:

  • Ad Hoc and Desktop Scanning - used by office workers who want to convert paper documents into usable electronic documents on which they can work or collaborate. The devices used are slow-speed scanners or networked office digital copiers (MFPs).

  • Batch and Distributed Batch Scanning - used to get documents into a centralized document repository or used to classify and route them to a centralized point as quickly as possible.

  • Full-Text Capture OCR - converts textual documents, such as scanned magazine articles, into ASCII data that can be edited or managed or used to find documents.

  • Transaction Capture and Process Management - previously forms processing, similar to batch and distributed batch capture, but the output is data-centric and used to provide data for use in a business process.

In recent years, those sub-segments have each shown some interesting trends. The first three have grown at more than 20 percent, driven by a number of key issues that are coming together to cause market stress, including increased business velocity; the need to reduce costs; the need to optimize equipment usage; the availability of lower-cost distributed duplex desktop scanners; full-text search at the desktop; and image standardization and business acceptance. The transaction capture and process management sector is currently the largest segment, accounting for 34 percent of the overall market. The proven forms processing technology offers some major cost reductions over in-house data entry or even offshore processing, which is increasing in cost.

[Source: Matrix]


The highly-skilled and experienced professional services group at AccuImage, LLC is dedicated to implementing professionally-crafted imaging, workflow, document management and enterprise report management technologies to solve clients' business problems. We are committed to understanding your business needs and providing implementation and installation services to satisfy your unique requirements.

Installing a complete information management system is a complex undertaking, and we have a track record of success in helping clients build enterprise-specific systems. Building from a production-ready viewpoint, we employ a proven-successful methodology to implement solutions and applications. This process allows a logical progression through the tasks of gathering, analyzing, designing, documenting, coding, testing and installing your solution.

Some of the consulting services that we provide our clients include:

  • Complete project implementation services
  • Systems architecture
  • Business process analysis
  • Workflow design and development
  • Application development
  • Systems integration
  • Script and program development
  • Web-based development
  • Conversion of legacy documents and imaging data
  • Solution and application testing
  • Documentation
  • Solution administration
  • Solution installations
  • Release upgrades
  • Solution audits
  • Product evaluations
  • Mentoring

We focus on using our expertise to reduce the time and costs associated with deploying new technology in your organization. With our assistance, you will be able to concentrate on your business and technology direction, knowing that they are backed by a solid management foundation, and that we will be applying our considerable product, deployment and industry expertise on your behalf.

Our professional services team can design and implement a content management system that delivers content quality and business value, regardless of your existing or future data volume. Contact AccuImage at 615.242.7226 for a consultation today!


AccuImage, LLC is a systems integrator that empowers their customers with solutions designed to gain the maximum value from their information at every point in the information lifecycle. Founded in 1996 and headquartered in Nashville, Tennessee, AccuImage specializes in the design, installation and support of document and content management systems, forms processing solutions, and electronic workflow systems. The company offers hardware and software from leading companies - AnyDoc Software, Böwe Bell+Howell, Canon, Captaris, Captovation, EMC Documentum, Fujitsu, Hewlett-Packard, IBM, Kodak, Kofax, Panasonic, Plasmon and Verity - as well as consulting, document conversion and professional services.