Development of an automatic system for translating biomedical patents in real time

July 17, 2014, Universitat Politècnica de Catalunya
Screenshot of MOLTO translation system.

The Center for Language and Speech Technologies and Applications (TALP) of the Universitat Politècnica de Catalunya · BarcelonaTech (UPC), a member of CIT UPC, has developed the prototype of an automatic translation system for patents in the biomedical area. The system can be used to create multilingual documents with the same structure as the original patents, including images, formulae and other kinds of annotations. In addition, the system works in real time and can be incorporated in web applications.

The TALP UPC researchers' work, which has been carried out over three years, is part of a collaborative project called MOLTO in the European Union's Seventh Framework Programme. MOLTO has involved the collaboration of research groups in Göteborg (Sweden), Helsinki (Finland), Utrecht (Holland), Sofía (Bulgaria) and Zurich (Switzerland).

With the general objective of obtaining an system in several languages that can produce high quality translations, MOLTO researchers have worked on three cases: the formulations of mathematical exercises, a description of objects in a museum and a model for translating patents, which is the case that TALP members have worked on directly.

As a general technique in the MOLTO project, the researchers used syntactic-semantic grammars created on the basis of specific domain ontologies (conceptual schemes that facilitate information exchange between systems). In turn, these components have been integrated into what is known as a Grammatical Framework (GF): the IT tool that makes automatic translations into different languages possible through a common abstract representation. To facilitate its use online, an Application Programming Interface (API) has been designed so that the tool can be included in any Web application.

Screenshot of MOLTO translation system.

For the patent translation, hybridization techniques were used that combine the Grammatical Framework and statistical methods. GF produces grammatically correct translations, whilst the inclusion of statistical techniques (similar to those used by machine translators such as Google Translate) can cover extensive domains such as biomedicine.

In addition, the patents are part of a document recovery system that initially could only search for documents in English. Therefore, special care has been taken to create a method that maintains the complex arrangement of tags and semantic annotations that are present in documents. Among other factors, this means that the structure of chemical compounds described in biotechnology registers can be maintained, and documents can be searched for in the translation language.

The result is the automatic of into English, French and German (the three official languages of the European Patent Office), with the added advantage that the translations can be carried out in . This is of great use in the task of multilingual database searches.

Explore further: Making online translation accurate, reliable and efficient

Related Stories

Making online translation accurate, reliable and efficient

June 13, 2013

European cooperation is based on our ability to understand each other. Given that there are presently 23 official EU languages, the availability of online tools to facilitate accurate translation is fundamentally important.

Google signs deal to translate European patents

November 30, 2010

(AP) -- Google announced an agreement Tuesday to use its technology to translate patents into 29 European languages, a deal officials hope will smooth the way toward a simplified European patent system after years of infighting.

Google adds automatic translation to Gmail

May 20, 2009

Google added automatic translation technology to Gmail on Tuesday, allowing users of its email service to translate messages in another language with a single mouse click.

Recommended for you

Cryptocurrency rivals snap at Bitcoin's heels

January 14, 2018

Bitcoin may be the most famous cryptocurrency but, despite a dizzying rise, it's not the most lucrative one and far from alone in a universe that counts 1,400 rivals, and counting.

Top takeaways from Consumers Electronics Show

January 13, 2018

The 2018 Consumer Electronics Show, which concluded Friday in Las Vegas, drew some 4,000 exhibitors from dozens of countries and more than 170,000 attendees, showcased some of the latest from the technology world.

Finnish firm detects new Intel security flaw

January 12, 2018

A new security flaw has been found in Intel hardware which could enable hackers to access corporate laptops remotely, Finnish cybersecurity specialist F-Secure said on Friday.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.