Clear and Compact Storage

The easiest way to digitize printed documentation is by scanning. In archival storage, good image quality is important - this implies high spatial resolution (high pixel per inch) and good intensity resolution (high bits per pixel). However, this creates large files, in turn causing large disk storage and Internet download time. By picking the right scanning format and storage medium for each type of image, file size can be minimized without sacrificing much clarity.

All text and most line drawings are best scanned as "line art" with one bit per pixel. Standard-sized text can usually be scanned at 200 dpi (dots per inch). This corresponds to the fine FAX resolution. Intricate diagrams and small text should be scanned at 300 dpi. This corresponds to early laser writer resolution. The "brightness" setting for the scan is critical for line art scans. Try a few test scans and zoom in and look carefully at the scan. If too light, information will be missing. If too dark, lines and letters will start "bleeding".

If you use an image editing program such as Photoshop or equivalent, defects in the scan can be touched-up. You can also crop white space around the image. Keep in mind that if you scan most of a page, the page may not be able to be entirely printed by most laser printers - they need a margin of up to 1/2" on each side. If you want to handle this case, you can shrink the image so that it is no larger than 7.5" by 10" (for American "letter"-sized paper). If you do shrink the image, you may need to scan at high resolution (say, 600 dpi), do the shrink, then convert to 200 or 300 dpi.

The best current formats for saving line art is the CompuServe gif format and the png format. They have built-in data compression and handles sharp-edged images well. The jpeg format tends to make line art fuzzy. A tip for reducing the line art file size: eliminate black speckles or other noise. A clean white backround compresses very well. A problem with gif is that an image saved as 300 dpi will often come out as 72 dpi when down-loaded. No pixels are lost, but the image is huge, which confuses some browsers. The most compatible all-around format is the Adobe Acrobat (pdf) format.

Photographs should be scanned in a multi-bit per pixel grayscale format (or color, if needed). The best format for storing pictures (not line drawings) is the jpeg format. Experiment with the different compression settings available, and chose the most efficient one that does not compromise the image quality. Don't scan and save a black and white image in color - it increases image size.

If you want to save a multi-page scanned document into a single file, the Adobe Acrobat pdf format is recommended. This requires buying the full Acrobat program, but the end result is a compact, professional digital document. The newer Acrobat readers (which are free) have nice options for printing oddball sized images. Acrobat does its own data compression, so you can import uncompressed scans (such as tiff or bmp files) directly into Acrobat, and the result will be about as good as a gif compression. Acrobat is also the medium of choice for distributing Postscript or EPS files.

When combining line-art images into a single pdf file, saving each page as a tiff (.tif) file with the page number as part of the file name saves time and allows multiple pages to be imported into Acrobat at one time. Acrobat 4.0 allows up to 50 images to be imported at once. For documents over 50 pages, make pdf files of chunks of 50 pages or less, then combine them into one pdf file.

Multi-page scans can also be compressed into a single file by creating a .zip archive (in DOS/Windows), a .sit or .cpt file (in the Mac OS), or a tar archive in UNIX. These are not as convenient to the end user as an Acrobat file, since most decompressors dump the individual page files on the hard disk, leaving a mess to clean up later. Also, the file compression of these programs isn't needed, since gif or jpeg files are already compressed.

For those with a lot of time on their hands, they can scan technical documentation, use OCR (optical character recognition) to convert text to ASCII characters, then use a page layout program or sophisticated word processor to essentially re-typeset the documentation. This requires skill, time, and good software, but is the most efficient way to store documentation.