Are you planning a new Document Imaging project and are unsure about how to proceed?
The following is a simple guideline for records managers and business owners who need a commonsense baseline to evaluate Document Imaging quotations from vendors. That is, for the records manager or business owner considering having paper records digitized; paper files scanned to create digital images of the original paperwork. It highlights most of the things you need to consider and allow for. Hopefully, it should help you when formulating budgets and negotiating with vendors.
What is the best and safest approach?
The best and safest approach is an end-to-end or ‘turnkey’ one where the vendor handles everything and takes responsibility for everything and provides a fixed price quote. For example:
- Doing all the preparation work involved to make your paper ‘scannable’.
- Capturing all contextual Metadata and the linking of same to the scanned images.
- Designing the best way to capture, index and organize the images and associated Metadata in your Electronic Document and Records Management System – EDRMS.
For peace of mind, you really want the vendor to handle everything required including the importing of all scanned images and Metadata into your EDRMS, so you end up with a working and ready-to-use solution where everything is easy to find.
Beware the bean counters
There will always be pressure from someone (usually the resident bean counter) for you and your staff to take on some of the workload to help lower the costs. I caution against this because it absolves the vendor of some of the responsibility for the final product and additionally, I have to assume that you and your staff are already busy in your usual day jobs and that taking on extra work isn’t always possible or wise.
What are the usual steps?
The vendor will (and should) want to analyze the data to be scanned and determine what preparation is required before providing a quote. The vendor will double check your estimated volumes and make recommendations based on the characteristics and properties of the data to be scanned. Most reputable vendors will be reluctant to provide you with a fixed price quotation until after the data inspection is completed. Below is an example (from local government) of what you could expect in a report following an initial data inspection:
- There is a need to back-capture Development Applications (DA);
- Each DA is stored in a cardboard file folder;
- There are 8,000 DA file folders, each containing on average 130 sheets of letter paper – totaling approximately 1,040,000 sheets of paper;
- Each DA file folder contains seven different document types;
- The images are required to be indexed via the file folder number (DA number) & document type (i.e. each file folder has to be scanned and indexed into 7 multi-page images – one for each document type);
- Most pages are single-sided, but some are duplex (double-sided);
- Documents are generally not stapled (approximately 11% are stapled) and don’t require repair (5% do require repair);
- Most pages at Letter size but about 10% are smaller or larger;
- Most pages are white and monochrome but roughly 5% are colored; and
- Where possible documents are to be OCRed and converted to text-searchable PDF files.
This normally involves removing (and replacing) pages from a cardboard file folder, removing staples, smoothing paper, orienting paper, etc. The objective should be to organize the pages into documents and batches to facilitate faster scanning using automatic document feed scanners. The most important component of any scanning quote is the time estimate (duration) and data preparation time is a key component of this.
Data preparation costs are sometimes called ‘handling’ costs. You want a fixed cost quote from the vendor for handling costs, that is, the vendor takes the responsibility and risk, not you. The responsible vendor will do random sampling during the data inspection step to better understand the handling costs involved in your job.
This is where all paper is captured initially as TIFF images or multi-page TIFF images. At this stage, the vendor may offer to optionally convert all or some of the TIFF images to text via an OCR (Optical Character Recognition) process. Note that this is usually an option; do not assume your digitized pages will be searchable because TIFF images are not full-text searchable. There are additional steps required for images to be full-text searchable.
If full text indexing is a requirement then make sure it is specified in your requirements document and included in the vendor’s quote. Note that if you do mandate full text indexing that the final format of the digitized image won’t be TIFF, it will probably be PDF or even better, PDF/A (an internationally recognized standard).
The time to scan each sheet of paper depends upon a few key factors like the quality of the original source document, whether it is single or double sided and its condition, i.e., wrinkled, folded, torn, stapled, etc. Expect a much higher cost when the quality of the source documents is poor.
OCRing the scanned images to create full-text searchable electronic documents
Note that the OCR process will lengthen the time taken to process any page and will increase the cost.
However, OCRing a document is usually an automated ‘background’, asynchronous process that consumes computer time and not much person time. It may double the time required to complete your work, but it should not double the costs.
This is where the vendor applies quality assurance processes to ensure that all pages have been properly scanned and that all scanned pages meet the minimum standard for readability. The vendor MUST be able to confirm that all pages have been scanned and at the agreed quality standard. Some form of quality control is mandatory in any scanning job and you need to ensure that you have specified quality control in your specification and that it is included as part of the vendor’s quote.
If you begin with 100,000 paper pages then you should end up with 100,000 scanned, indexed and readable images of pages in your EDRMS; this sounds simple, but it often is not so. Please think about the metrics required to ensure this level of quality control; you can’t afford to lose information. I have often had customers complain many months after outsourcing a document scanning job that they have just discovered that pages are missing or that some of the scanned pages are unreadable. Don’t let it happen to you.
This is where the vendor imports the digitized images into your EDRMS (maybe working with your EDRMS vendor) and creates all the links and Metadata necessary for efficient and appropriate searching. As mentioned previously, there is no point in having a huge database of scanned images if it is not searchable in a manner appropriate to each organization’s business processes.
Specify quote format
To ensure you are comparing apples to apples you need to detail how you want the costs expressed in your requirements document, your RFQ. For example, what will be the travel, expenses or transport costs? I would always suggest that you give each vendor a STANDARD COST SCHEDULE to complete with its response to ensure uniformity and so you can easily compare all the vendor’s quotes.
The Pricing Basis
You can either specify the breakdown of costs (see example below) or just ask for a fixed price per scanned page. Please don’t ask for a fixed price per document (I have seen this many times) because the vendor will then have to assume an average number of pages per document and this will lead to significant variations in the quotes. Obviously a ‘document ‘ can be from 1 to several hundred pages so it is not a standard unit of measurement.
Even when asking for a quote per ‘page’ you need to specify whether your ‘page’ is single or double-sided because a double-side page takes at least twice as long to scan as a single-sided page.
Please also be aware of the issues of handling blank pages; you do not want to be charged for scanning blank pages. Most modern multi-feed scanners and scanning software have a feature to ignore blank pages. This is especially important if your pages are a mix of single and double-sided.
Contents of the quote
If you ask for a detailed breakdown, the vendor should detail all of the professional services and costs required including solution design, project setup, paper handling, scanning, capture, transport costs (if the job is being done offsite), etc.
If you ask for a simple fixed price per page the vendor will bundle all costs into a single figure such as a flat cost per page, e.g., 12 cents. If this is the case, you need to ensure that there are no exclusions, that is, no possible additional costs not included in the quote.
The following is a sample generic quote listing all components of the quote. In real life you are unlikely to get all of these lines items unless you specifically mandate them in your request for quote document (RFQ).
- Data Inspection $150 per hour for 4 hours = $600
- Data Preparation $40 per hour for 120 hours = $4,800
- Scanning $40 per hour for 200 hours = $8,000
- Capture $150 per hour for 4 hours = $600
- Verification $150 per hour for 20 hours = $3,000
- Delivery and Installation $150 per hour for 4 hours = $600
Standard costs per page scanned
If you specify a single fixed price per scanned page in your RFQ, the quote will look like the following:
“Standard simplex, 200 dpi black and white, OCR creating text searchable PDF = $0.13 per image for a total cost of $20,500.00”
Time and Material Quotes
Sometimes, because you simply don’t have enough detail to prepare a detailed RFQ, you may be forced to accept time and material quotes. For example, “$30 per hour for as long as it takes.” No responsible executive is going to want to sign for an open-ended quote like this, so you need to include parameters in your acceptance to control and limit costs.
Alternatively, take my first suggestion and provide the vendors access to your store of file folders so they can adequately assess the problem and provide a fixed price quote.
If you really don’t know the extent of the problem, I highly recommend hiring someone first to assess, analyze and scope out the tasks involved BEFORE asking vendors for quotes. I assure you that this will save you money in the final analysis.
Final Inspection and Sign-Off
This is where you inspect the final product and approve the job for payment. Please make sure that inspection and sign-off steps are part of the requirement specification, your RFQ. Always ask the vendor to provide signed copies of its verification paperwork and also have your staff do random sampling to confirm that nothing has gone awry. This is IT, so things will go wrong. The simplest sanity test is pages in equals pages out.
A main consideration is whether the work will be done on your premises or at the vendor’s site. In most cases, because of the volumes of paper involved and the danger of lost data if data is shipped back and forth, it is preferable to do the actual scanning at your premises.
However, when this is not possible, the vendor will provide an alternative site, but additional costs may apply (e.g., transport costs, office rental, etc.).