Fujitsu leverages AI to develop highly accurate recognition technology for strings of handwritten Chinese characters
Fujitsu today announced the development of an artificial intelligence model that can generate highly reliable recognition of handwritten character strings. The results of this model represent the world's highest degree of accuracy in recognizing handwritten Chinese character strings. Recognition of individual handwritten Chinese characters using deep learning and other AI models has already surpassed human recognition capability. When used on strings of handwritten characters, however, issues arise with an inability to correctly break the strings into individual characters. Given this, the new Fujitsu-developed AI model can rank degree of reliability, assigning a high degree of reliability to correct characters, and a low degree of reliability to portions that are not characters, in image recognition for handwritten strings of characters. By applying this model, recognition mistakes in characters have been reduced to less than half that of previous technology, greatly improving the efficiency of tasks such as digitization of handwritten texts.
This technology will be used as part of Human Centric AI Zinrai, the Fujitsu AI technology. Details of this technology were announced at the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR-2016), held on October 24 in China.
Character recognition is a field where the utilization of AI promises greater task efficiency. Fujitsu Laboratories has several decades of experience in research and development relating to character recognition, and has a large portfolio of technologies, such as machine translation, in the field of Japanese language processing. In September 2015, using AI technologies modeled on the workings of the human brain, Fujitsu announced its successful demonstration of the world's first technology with a character recognition rate that exceeded that of a human to recognize individual handwritten Chinese characters. However, Chinese sentences are made up of strings of complex Chinese characters and when an individual character is not clearly distinguishable, such as in a handwritten form, it is difficult to recognize a character accurately.
Such technologies using AI start off with a supervised sample of characters to enable the system to learn and remember features of multiple character patterns used by humans when recognizing characters. Next, an image of a string of characters would be divided into parts, and by determining the blank spaces would separate the radicals (the components that make up a Chinese character) and have situations where the separated areas would display a single region (top row of Figure 1), and situations when parts from neighboring characters become a region (bottom row of Figure 1). The program then assumes each region represents an individual character, and outputs the candidate character recognition result and its degree of reliability, using a recognition algorithm based on its earlier learning. The closer the degree of reliability is to one, the higher the program's reliability is of the candidate character. It finally outputs its recognition results by selecting in order the combination that has the highest average degree of reliability (bottom of Figure 1). With the previous technology, however, there were times when the system would output a high degree of reliability for images that were not characters, such as the component radicals, creating an issue where the system could not correctly separate characters.
About the Technology
This Fujitsu-developed technology generates a high level of reliability only for proper characters. It does this by using a heterogeneous deep learning model, which, in addition to supervised character samples used in conventional technology, uses a newly developed supervised sample of non-characters made up of radicals, and combinations of parts which do not make up characters. Technology features are as follows.
1. Effective learning technology with heterogeneous deep learning, including non-characters
In a heterogeneous deep learning model, two types of supervised samples are used: one for existing characters, and another for non-characters. Compared with the supervised character sample, the supervised non-character sample achieved a huge number by dividing up characters and recombining them. Therefore, by having the system remember the features of non-characters that can easily appear in combinations of neighboring parts in Chinese sentences, Fujitsu developed technology that can effectively learn, even with an asymmetrical deep learning model (Figure 2a).
2. Technology to correctly break down handwritten character strings based on degree of reliability
By inputting images of candidate areas into the trained heterogeneous deep learning model, and creating a system that outputs a degree of reliability for both characters and non-characters, high for candidate areas which form characters and low for candidate areas which do not, Fujitsu developed a technology that effectively separates a string of characters into individual characters (Figure 2b). An existing Chinese language processing model is then applied, and based on an analysis of whether the recognition candidates form a string of correct Chinese, the final candidate sentence is output. Because the level of reliability for combinations of parts which do not form existing characters is lower than the level of reliability toward actual characters, by applying this recognition technology, correct recognition results can be achieved by selecting the segment path with the highest degree of reliability, beginning with the start of the string of characters (Figure 3).
When this technology was benchmarked against a database of handwritten Chinese released in 2010 by the Institute of Automation, Chinese Academy of Sciences (CASIA), which is used as a standard by academic societies, it achieved recognition accuracy of 96.3%, the highest achieved to date, surpassing previous technologies by 5%. As a result, this technology can greatly improve the efficiency of inputting handwritten text.
This technology is effective for languages that have no spacing between words, including Chinese, Japanese, and Korean. It is expected that the recognition accuracy of free-form handwritten text in Japanese will significantly improve by bringing this technology together with Fujitsu Laboratories' long-accumulated track record of language processing technology for Japanese. Fujitsu will aim to bring this technology to Zinrai in 2017, Fujitsu's AI technology platform, and apply it in stages toward a handwritten digital ledger system for Japan and other solutions.