Image restoration technology capable of making A3-sized PDFs using A4 scanner

Fujitsu Laboratories Limited today announced the development of image restoration technology that can generate PDFs from multi-page A3 documents fed as a batch into an A4 scanner.

Typically, scanning both sides of a double-sided A3 document with an A4 scanner involves folding each sheet in half and manually feeding it twice into the scanner. With Fujitsu Laboratories' new technology, by simply cutting A3 sheets in half and scanning them using a scanner's automatic feeder, the original A3 layout of the scanned images can be automatically detected and composite A3 images assembled. At the same time, image correction is applied so that the boundaries between left and right halves are inconspicuous. As a result, A3 documents can be converted to PDFs using an A4 scanner with less than 20% of the effort.

Details of this technology are being presented at the International Conference on Pattern Recognition (ICPR) 2012, beginning November 12 at the Tsukuba International Congress Center, and at the December Study Group of CVIM2012, beginning December 3 at Yokohama National University.

With the spread of the paperless office, more and more existing paper documents are being converted to PDFs for electronic storage. While compact desktop scanners can efficiently handle the PDF conversion of paper documents, A4-sized scanners are most frequently employed, and there has been no easy way using them to scan A3 documents. The typical approach for scanning an A3 document has been to fold the document in half and then manually feed each sheet carefully into a two-sided scanner (Figure 1). This requires considerable effort, particularly for multi-page documents.

By cutting an A3 document in half, the automatic sheet feeder on a scanner can be used for batch-mode scanning to avoid the effort of folding and manually feeding each sheet. At the same time, this approach creates its own problems:

Batch-scanning a multi-page document means the left and right halves of each A3 document can easily wind up being mixed together in no particular order, making it difficult to reassemble the original A3 images.
When paper is being fed into the scanner, sheets may slide around or be fed in at slightly different speeds. After having scanned the left and right halves separately, this will create mismatches in text and figures at the boundary between the two when reassembling the original.

Fujitsu Laboratories has developed a technology that, after cutting multi-page A3 documents in half and batch-scanning the images, they are restored to their original A3 layout. Key features of this technology are as follows.

1. Automatic estimation of image grouping to restore A3 document image

From the intermixed scanned images of the left and right halves of an A3 document, the technology will automatically estimate how images are grouped to recreate the original A3-page layout (Figure 2).

2. Correction of localized stretching in scanned images

This technology corrects localized stretching in scanned images, thereby enabling lines, text and diagrams to come together naturally at the boundary when joining left- and right-side scanned images of an original A3 document (Figure 3).

This technology makes it possible to easily scan multi-page A3 documents with a compact A4-sized scanner. Compared to the previous approach of folding in half and scanning each A3 sheet, the manual labor involved in this method requires less than 20% of the effort and produces composite A3 documents with fewer boundary mismatches than existing methods.

To further accelerate image processing, Fujitsu Laboratories is aiming to equip A4-size scanners with this functionality. The company will also move forward on developing technology that generates scans the same size as the original image, even for documents larger than A3 cut into more than two pieces, simply by scanning their separate parts.