'Lensless' imaging through advanced machine learning for next generation image sensing solutions

Recent advances in computing technology can simplify the by substituting computing for some parts of the optical system. The entire lens can be abandoned thanks to the use of image reconstruction computing, allowing for a lensless camera, which is ultra-thin, lightweight, and low-cost. The lensless camera has been gaining traction recently. But thus far, the image reconstruction technique has not been established, resulting in inadequate imaging quality and tedious computation time for the lensless camera.

Recently, researchers have developed a new image reconstruction method that improves computation time and provides high-quality images. Describing the initial motivation behind the research, a core member of the research team, Prof. Masahiro Yamaguchi of Tokyo Tech, says, "Without the limitations of a lens, the lensless camera could be ultra-miniature, which could allow new applications that are beyond our imagination." Their work has been published in Optics Letters.

The typical optical hardware of the lensless camera simply consists of a thin mask and an image sensor. The image is then reconstructed using a mathematical algorithm. The mask and the sensor can be fabricated together in established semiconductor manufacturing processes for future production. The mask optically encodes the incident light and casts patterns on the sensor. Though the casted patterns are completely non-interpretable to the human eye, they can be decoded with explicit knowledge of the optical system.

A schematic of the how the lensless imaging process works, from light collection through encoding the signal to post-processing with computing algorithms. Credit: Xiuxi Pan from Tokyo Tech

Vision Transformer (ViT) is leading-edge machine learning technique, which is better at global feature reasoning due to its novel structure of the multistage transformer blocks with overlapped "patchify" modules. This allows it to efficiently learn image features in a hierarchical representation, making it able to address the multiplexing property and avoid the limitations of conventional CNN-based deep learning, thereby allowing better image reconstruction. Credit: Xiuxi Pan from Tokyo Tech

The lensless camera consists of a mask and an image sensor with a 2.5 mm separation distance. The mask is fabricated by chromium deposition in a synthetic-silica plate with an aperture size of 40×40 μm. Credit: Xiuxi Pan from Tokyo Tech

The targets are the images displayed on an LCD screen (left two columns) and the objects in the wild (right two columns; beckoning cat doll and stuffed bear), respectively. The first row shows the ground truth images displayed on the screen and the shooting scenes for in-the-wild objects. The second row shows the captured patterns on the sensor. The last three rows illustrate the reconstructed images by the proposed, model-based, and CNN-based methods, respectively. The proposed method produces the most high-quality and visually appealing images. Credit: Xiuxi Pan from Tokyo Tech