Development of an automatic system for translating biomedical patents in real time

Development of an automatic system for translating biomedical patents in real time
Screenshot of MOLTO translation system.

The Center for Language and Speech Technologies and Applications (TALP) of the Universitat Politècnica de Catalunya · BarcelonaTech (UPC), a member of CIT UPC, has developed the prototype of an automatic translation system for patents in the biomedical area. The system can be used to create multilingual documents with the same structure as the original patents, including images, formulae and other kinds of annotations. In addition, the system works in real time and can be incorporated in web applications.

The TALP UPC researchers' work, which has been carried out over three years, is part of a collaborative project called MOLTO in the European Union's Seventh Framework Programme. MOLTO has involved the collaboration of research groups in Göteborg (Sweden), Helsinki (Finland), Utrecht (Holland), Sofía (Bulgaria) and Zurich (Switzerland).

With the general objective of obtaining an system in several languages that can produce high quality translations, MOLTO researchers have worked on three cases: the formulations of mathematical exercises, a description of objects in a museum and a model for translating patents, which is the case that TALP members have worked on directly.

As a general technique in the MOLTO project, the researchers used syntactic-semantic grammars created on the basis of specific domain ontologies (conceptual schemes that facilitate information exchange between systems). In turn, these components have been integrated into what is known as a Grammatical Framework (GF): the IT tool that makes automatic translations into different languages possible through a common abstract representation. To facilitate its use online, an Application Programming Interface (API) has been designed so that the tool can be included in any Web application.

Development of an automatic system for translating biomedical patents in real time
Screenshot of MOLTO translation system.

For the patent translation, hybridization techniques were used that combine the Grammatical Framework and statistical methods. GF produces grammatically correct translations, whilst the inclusion of statistical techniques (similar to those used by machine translators such as Google Translate) can cover extensive domains such as biomedicine.

In addition, the patents are part of a document recovery system that initially could only search for documents in English. Therefore, special care has been taken to create a method that maintains the complex arrangement of tags and semantic annotations that are present in documents. Among other factors, this means that the structure of chemical compounds described in biotechnology registers can be maintained, and documents can be searched for in the translation language.

The result is the automatic of into English, French and German (the three official languages of the European Patent Office), with the added advantage that the translations can be carried out in . This is of great use in the task of multilingual database searches.

Provided by Universitat Politècnica de Catalunya

Citation: Development of an automatic system for translating biomedical patents in real time (2014, July 17) retrieved 23 November 2024 from https://phys.org/news/2014-07-automatic-biomedical-patents-real.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Making online translation accurate, reliable and efficient

0 shares

Feedback to editors