New technology for machine translation now available

January 22, 2019, Netherlands Organisation for Scientific Research (NWO)
Credit: CC0 Public Domain

A new methodology to improve machine translation has become available this month through the University of Amsterdam. The project DatAptor, funded by NWO/STW, increasingly advances translation machines by selecting data sets.

The methodology is used in the application Matching Data, offered by TAUS, an important think tank in the field of . This application tackles a big challenge within digital : for a good translation it is necessary to train the translation machine with reliable sources and datasets that contain the relevant type of words. For example, translating a legal text requires a completely different vocabulary and a different type of translation than for example, a newspaper report.

Successful implementation

In 2013 the DatAptor project, supervised by Professor Khalil Sima'an of the UvA Institute for Logic, Language and Computation, received funding from Technology foundation STW (now: NWO Domain Applied and Engineering Sciences) to deal with this problem. The research results of the DatAptor project have now been successfully implemented by think tank TAUS. They offer the new technology under the name Matching Data.

On the weblog of TAUS Sima'an says: "Our dream was to make the world wide web itself the source of all data selections. But we decided to start more modest and make the very large TAUS Data repository our hunting field first. In DatAptor we learned that every domain is a mixture of many subdomains. The combinatorics of subdomains in a very large repository harbors a wealth of new, untapped selections. Therefore, if the user provides a Query corpus representing their domain of interest, the Matching Data method is likely to find a suitable selection in the repository."

Explore further: Google moves to curb gender bias in translation

More information: Data-Powered Domain-Specific Translation Services On Demand (DatAptor)

Provided by: Netherlands Organisation for Scientific Research (NWO)


Related Stories

Using multi-task learning for low-latency speech translation

August 13, 2018

Researchers from the Karlsruhe Institute of Technology (KIT), in Germany, have recently applied multi-task machine learning to low-latency neural speech translation. Their study, which was pre-published on ArXiv, addresses ...

Bible helps researchers perfect translation algorithms

October 23, 2018

In search of inspiration for improving computer-based text translators, researchers at Dartmouth College turned to the Bible for guidance. The result is an algorithm trained on various versions of the sacred texts that can ...

Recommended for you

Meteorite source in asteroid belt not a single debris field

February 17, 2019

A new study published online in Meteoritics and Planetary Science finds that our most common meteorites, those known as L chondrites, come from at least two different debris fields in the asteroid belt. The belt contains ...

Diagnosing 'art acne' in Georgia O'Keeffe's paintings

February 17, 2019

Even Georgia O'Keeffe noticed the pin-sized blisters bubbling on the surface of her paintings. For decades, conservationists and scholars assumed these tiny protrusions were grains of sand, kicked up from the New Mexico desert ...

Archaeologists discover Incan tomb in Peru

February 16, 2019

Peruvian archaeologists discovered an Incan tomb in the north of the country where an elite member of the pre-Columbian empire was buried, one of the investigators announced Friday.

Where is the universe hiding its missing mass?

February 15, 2019

Astronomers have spent decades looking for something that sounds like it would be hard to miss: about a third of the "normal" matter in the Universe. New results from NASA's Chandra X-ray Observatory may have helped them ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.