This post is designed to be used as a guideline by the records manager or business owner who is planning to have paper records digitized. It includes most of the things you need to think about when allocating a contract for Document Imaging (i.e., scanning paper records to produce digital images). It suggests things you should allow for when formulating budgets and negotiating with vendors.
The safest approach is always an end-to-end one where the vendor handles everything and takes responsibility for the final product. That is, doing all the preparation work involved to make your paper ‘scannable’, capturing all Metadata and the linking of same to the scanned images. It should also include importing the scanned images and Metadata into your document and image repository (e.g., your Electronic Document and Records Management System – EDRMS).
There is little point in digitizing a mass of paper if the results are not easily and conveniently searchable using your preferred terminology or Taxonomy. A pile of DVDs in a corner won’t help anyone.
The usual processes involved are:
The vendor will want to inspect and analyze the data to be scanned and determine what preparation or ‘paper handling’ is required. The vendor will also want to double check your estimated volumes.
Most reputable vendors will be reluctant to provide you with a fixed price quotation until after the data inspection is completed.
This normally involves removing pages from a cardboard file folder, removing staples, smoothing paper, orienting paper, etc. The objective should be to organize the pages into documents and batches to facilitate faster scanning using high-speed automatic document feed scanners. The most important component of any scanning quote is the time estimate (duration) and data preparation time is a key component of this.
This is where all paper is captured as TIFF images and multi-page documents are captured as multi-page TIFF images. You may also require the vendor convert all or some of the content of the TIFF images to text via an OCR (Optical Character Recognition) process. Note that this is usually an option; do not assume your digitized pages will be full text-searchable because TIFF images are not full-text searchable. There is an additional OCR step required for images to be full-text searchable.
If full text indexing is a requirement, make sure it is specified in your requirements document and included in the vendor’s quote. If you do mandate full text indexing then the final format of the digitized image won’t be TIFF, it will probably be PDF or even better, PDF/A (an internationally recognized standard).
The time to scan each sheet paper depends upon a few key factors like the quality of the original source document, whether it is single or double sided and its condition, i.e., wrinkled, folded, torn, stapled, etc. Expect a much higher cost when the quality of the source documents is poor.
Verification – Scanning
This is where the vendor applies quality assurance processes to ensure that all pages have been properly scanned. This means the vendor should be able to confirm that all pages have been scanned at the agreed quality standard. Some form of quality control is mandatory in any scanning job and you need to ensure that you have specified quality control in your specification and that it is included as part of the vendor’s quote.
This is where the vendor imports the digitized images into your EDRMS and creates all the links and Metadata necessary for efficient and appropriate searching. As mentioned previously, there is no point in having a huge database of scanned images if it is not searchable in a manner appropriate to each organization’s business processes. Note that you will need to tell the vendor how you want the scanned images ‘organized’ or classified in your EDRMS.
Verification – Capture
This is where the vendor sanity checks the capture process and confirms that all pages that have been scanned and captured have also been imported into your EDRMS as per specification. If you begin with 100,000 paper pages then you should end up with 100,000 scanned, indexed and readable images of pages in your EDRMS; this sounds simple but it often is not so. Please think about the metrics required to ensure this level of quality control; you can’t afford to lose information.
Final inspection and sign-Off
This is where you inspect the final product and approve the job for payment. Please make sure that inspection and sign-off acceptance steps are part of the requirement specification. When doing so, ask the vendor to provide signed copies of its verification paperwork and also have your staff do random sampling to confirm that nothing has gone awry.
Specify the quote format
To ensure you are comparing apples to apples you need to detail how you want the costs expressed in your requirements document. For example, what will be the travel, expenses or transport costs? I would always suggest that you give all vendors a standard cost schedule to complete with their responses to ensure uniformity (Apples to Apples) across quotes from different vendors.
You can either specify that you want a detailed breakdown of costs (see example below) or just a fixed price per scanned page. Please don’t ask for a fixed price per document (I have seen this many times) because the vendor will then have to assume an average number of pages per document and this will lead to significant variations in the quotes. Obviously a ‘document ‘ can be from 1 to several hundred pages so it is not a standard unit of measurement.
Even when asking for a quote per ‘page’ you need to specify whether your ‘page’ is single or double-sided because a double-side page takes at least twice as long to scan as a single-sided page.
Please also be aware of the issues of handling blank pages; you do not want to be charged for scanning blank pages. Most modern multi-feed scanners have a feature to ignore blank pages. This is especially important if your pages are a mix of single and double-sided.
Contents of the detailed quote
The vendor should detail all of the professional services and costs required including solution design, project setup, paper handling, scanning, capture, transport costs (if the job is being done offsite), etc.
The following is a sample generic quote listing all components of the quote. In real life, you are unlikely to get all of these lines items unless you specifically ask for them.
- Data Inspection
- Data Preparation
- Delivery and Installation
The fixed price per page quote
If you ask for a simple fixed price per page the vendor will bundle all costs into a single figure such as a flat cost per page, e.g., 13 cents. If this is the case, you need to ensure that there are no exclusions, that is, no possible additional costs not included in the quote. For example:
“Standard simplex, 200 dpi black and white, OCR creating TIFF/PDF = $0.13 per image”
Another factor is whether the work will be done on your premises or at the vendor’s site. In most cases, because of the volumes of paper involved and the danger of lost data if data is shipped back and forth, it is preferable to do the actual scanning at your premises. However, when this is not possible, the vendor will provide an alternative site but additional costs may apply (e.g., transport costs, office rental, etc.).