Machine Translation: A Short Overview

This story is an overview of the field of Machine Translation. The story introduces several highly cited literature and famous applications, but I’d like to encourage you to share your opinion in the comments. The aim of this story is to provide a good start for someone new to the field. It covers the three main approaches of machine translation as well as several challenges of the field. Hopefully, the literature mentioned in the story presents the history of the problem as well as the state-of-the-art solutions.

Machine translation (MT) is the task to translate a text from a source language to its counterpart in a target language. There are many challenging aspects of MT: 1) the large variety of languages, alphabets and grammars; 2) the task to translate a sequence (a sentence for example) to a sequence is harder for a computer than working with numbers only; 3) there is no one correct answer (e.g.: translating from a language without gender-dependent pronouns, he and she can be the same).

Machine translation is a relatively old task. From the 1970s, there were projects to achieve automatic translation. Over the years, three major approaches emerged:

Rule-based Machine Translation (RBMT): 1970s-1990s
Statistical Machine Translation (SMT): 1990s-2010s
Neural Machine Translation (NMT): 2014-

RNrM7ne.jpg!web

Photo by Gerd Altmann on Pixabay

Rule-based Machine Translation

A rule-based system requires experts’ knowledge about the source and the target language to develop syntactic, semantic and morphological rules to achieve the translation.

The Wikipedia article of RBMT includes a basic example of rule-based translation from English to German. The translation needs an English-German dictionary, a rule set for English grammar and a rule set for German grammar

An RBMT system contains a pipeline of Natural Language Processing (NLP) tasks including Tokenisation, Part-of-Speech tagging and so on. Most of these jobs have to be done in both source and target language.

RBMT examples

SYSTRAN is one of the oldest Machine Translation company. It translates from and to around 20 languages. SYSTRAN was used for the Apollo-Soyuz project (1973) and by the Europian Commission (1975) [1]. It was used by Google’s language tools until 2007. See more at its Wikipedia article or the company’s website . With the emerge of STM, SYSTRAN started using statistical models and recent publications show that they are experimenting with the neural approach as well [2]. The OpenNMT toolkit is also a work of the company’s researchers [3].

Apertium is open-source RBMT software released under the terms of GNU General Public License. It is available in 35 languages and it is still under development. It was originally designed for languages closely related to Spanish [4]. The image below is an illustration of the Apertium’s pipeline.

b6vUbuf.png!web

Apertium pipeline — Photo by Darkgaia1 on Wikipedia

GramTrans is a cooperation of a company based in Denmark and a company based in Norway and it offers machine translation for Scandinavian languages [5].

Advantages

No bilingual text required
Domain-independent
Total control (a possible new rule for every situation)
Reusability (existing rules of languages can be transferred when paired with new languages)

Disadvantages

Requires good dictionaries
Manually set rules (requires expertise)
The more the rules the harder to deal with the system

Statistical Machine Translation

This approach uses statistical models based on the analysis of bilingual text corpora. It was first introduced in 1955 [6], but it gained interest only after 1988 when the IBM Watson Research Center started using it [7, 8].

The idea behind statistical MT is the following:

Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(S|T).

— A Statistical Approach to Machine Translation , 1990. [8]

Using Bayes’ theorem, we can transform this maximisation problem to the product of Pr(S) and Pr(T|S), where Pr(S) is the language model probability of S (S is the right sentence in that place) and Pr(T|S) is the translation probability of T given S. In other words, we are seeking the most likely translation given how correct a candidate translation is and how well it fits in the context.

Therefore, an SMT requires three steps: 1) a Language Model (what is the correct word given its context?); 2) a Translation Model (what is the best translation of a given word?); 3) a method to find the right order of words.

In the previous paragraphs, we used both sentence and word as the unit of translation. The most used model is somewhere between these. It’s called phrase -based translation. For example, the English phrase “is buying” is translated to “achète” in French.

Ab2uAzA.png!web

Non-factored and factored translation — Figures from Moses: Open Source Toolkit … [9]

SMT examples

Google Translate (between 2006 and 2016, when they announced to change to NMT )
Microsoft Translator (in 2016 changed to NMT )
Moses : Open source toolkit for statistical machine translation. [9]

Advantages

Less manual work from linguistic experts
One SMT suitable for more language pairs
Less out-of-dictionary translation: with the right language model, the translation is more fluent

Disadvantages

Requires bilingual corpus
Specific errors are hard to fix
Less suitable for language pairs with big differences in word order

Neural Machine Translation

The neural approach uses neural networks to achieve machine translation. Compared to the previous models, NMTs can be built with one network instead of a pipeline of separate tasks.

In 2014, sequence-to-sequence models were introduced opening new possibilities for neural networks in NLP. Before the seq2seq models, the neural networks needed a way to transform the sequence input into computer-ready numbers (one-hot encoding, embeddings). With seq2seq, the possibility of training a network with input and output sequences became possible [10, 11].

MjYJJvj.png!web

LSTM based seq2seq model: a) training phase, b) prediction phase

The NMT emerged quickly. After a few years of research, these models outperformed the SMTs [12]. With the improved results, many translator provider companies changed their networks to neural-based models including Google [13] and Microsoft.

A problem with neural networks occurs if the training data is unbalanced, the model cannot learn from the rare samples as well as frequent ones. In the case of languages, it is a common problem as there are many rare words used only a few times on the entire Wikipedia for example. To train a model that does not biased towards frequent words (e.g.: multiple occurrences on every Wikipedia pages) can be challenging. A recent paper proposes a solution using a post-processing step to translate these rare words with a dictionary [14].

Recently, Facebook researchers introduced an unsupervised MT model working with both SMT and NMT that requires only large monolingual corpora and not a bilingual one [15]. The main bottleneck of the previous examples was the lack of large database with translations to train on. This model shows promise to resolve this issue.

NMT examples

Google Translate (from 2016) link to language team at Google AI
Microsoft Translate (from 2016) link to MT research at Microsoft
Translation on Facebook: link to NLP at Facebook AI
OpenNMT: An open-source neural machine translation system. [16]

Advantages

End-to-end models (no pipeline of specific tasks)

Disadvantages

Requires bilingual corpus
Rare word problem

UzU7veM.png!web

Translation quality of statistical and neural MT models by Google — Figure by Google

Summary

In this story, we covered the three approaches to the problem of Machine Translation. Many important publications are collected alongside important applications. The story revealed the history of the field and collected literature of the state-of-the-art models. I hope that it is a good start for someone new to the field.

If you think that something is missing, feel welcomed to share it with me!

References

[1] Toma, P. (1977, May). Systran as a multilingual machine translation system. In Proceedings of the Third European Congress on Information Systems and Networks, Overcoming the language barrier (pp. 569–581).

[2] Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., … & Enoue, S. (2016). Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540 .

[3] Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 .

[4] Corbí Bellot, A. M., Forcada, M. L., Ortiz Rojas, S., Pérez-Ortiz, J. A., Ramírez Sánchez, G., Sánchez-Martínez, F., … & Sarasola Gabiola, K. (2005). An open-source shallow-transfer machine translation engine for the Romance languages of Spain.

[5] Bick, Eckhard (2007), Dan2eng: Wide-Coverage Danish-English Machine Translation , In: Bente Maegaard (ed.), Proceedings of Machine Translation Summit XI, 10–14. Sept. 2007, Copenhagen, Denmark . pp. 37–43

[6] Weaver, W. (1955). Translation . Machine translation of languages , 14 , 15–23.

[7] Brown, P., Cocke, J., Pietra, S. D., Pietra, V. D., Jelinek, F., Mercer, R., & Roossin, P. (1988, August). A statistical approach to language translation. In Proceedings of the 12th conference on Computational linguistics-Volume 1 (pp. 71–76). Association for Computational Linguistics.

[8] Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., … & Roossin, P. S. (1990). A statistical approach to machine translation. Computational linguistics , 16 (2), 79–85.

[9] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., … & Dyer, C. (2007, June). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180).

[10] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks . In Advances in neural information processing systems (pp. 3104–3112).

[11] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 .

[12] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 .

[13] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 .

[14] Luong, M. T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2014). Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206 .

[15] Lample, G., Ott, M., Conneau, A., Denoyer, L., & Ranzato, M. A. (2018). Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 .

[16] Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 .

Learn NMT with BERT stories

BLEU-BERT-y: Comparing sentence scores
Visualisation of embedding relations (word2vec, BERT)
Machine Translation: A Short Overview

Rule-based Machine Translation

RBMT examples

Advantages

Disadvantages

Statistical Machine Translation

SMT examples

Advantages

Disadvantages

Neural Machine Translation

NMT examples

Advantages

Disadvantages

Summary

References

Learn NMT with BERT stories

Recommend

TikTok 将制定美国内容审核政策

Bazel 1.0 发布

Realtek Wi-Fi 芯片驱动漏洞能触发内核的缓冲溢出

Windows Phone 8.1 应用商店将在 12 月关闭

中国科学家解析非洲猪瘟病毒颗粒

Working with Errors in Go 1.13 - The Go Blog

华为折叠屏手机10月23日发布，规模生产良品率面临考验

卡门简报 | 特斯拉上海工厂正式通电；蔚来出钱为二手车“兜底”

“野狼”当道，歌手向下

还在尬买热搜？头部影视剧们早就不这么玩了

About Joyk