29

Machine Translation: A Short Overview

 4 years ago
source link: https://www.tuicool.com/articles/fuuAbuI
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

This story is an overview of the field of Machine Translation. The story introduces several highly cited literature and famous applications, but I’d like to encourage you to share your opinion in the comments. The aim of this story is to provide a good start for someone new to the field. It covers the three main approaches of machine translation as well as several challenges of the field. Hopefully, the literature mentioned in the story presents the history of the problem as well as the state-of-the-art solutions.

Machine translation (MT) is the task to translate a text from a source language to its counterpart in a target language. There are many challenging aspects of MT: 1) the large variety of languages, alphabets and grammars; 2) the task to translate a sequence (a sentence for example) to a sequence is harder for a computer than working with numbers only; 3) there is no one correct answer (e.g.: translating from a language without gender-dependent pronouns, he and she can be the same).

Machine translation is a relatively old task. From the 1970s, there were projects to achieve automatic translation. Over the years, three major approaches emerged:

  • Rule-based Machine Translation (RBMT): 1970s-1990s
  • Statistical Machine Translation (SMT): 1990s-2010s
  • Neural Machine Translation (NMT): 2014-

RNrM7ne.jpg!web

Photo by Gerd Altmann on Pixabay

Rule-based Machine Translation

A rule-based system requires experts’ knowledge about the source and the target language to develop syntactic, semantic and morphological rules to achieve the translation.

The Wikipedia article of RBMT includes a basic example of rule-based translation from English to German. The translation needs an English-German dictionary, a rule set for English grammar and a rule set for German grammar

An RBMT system contains a pipeline of Natural Language Processing (NLP) tasks including Tokenisation, Part-of-Speech tagging and so on. Most of these jobs have to be done in both source and target language.

RBMT examples

SYSTRAN is one of the oldest Machine Translation company. It translates from and to around 20 languages. SYSTRAN was used for the Apollo-Soyuz project (1973) and by the Europian Commission (1975) [1]. It was used by Google’s language tools until 2007. See more at its Wikipedia article or the company’s website . With the emerge of STM, SYSTRAN started using statistical models and recent publications show that they are experimenting with the neural approach as well [2]. The OpenNMT toolkit is also a work of the company’s researchers [3].

Apertium is open-source RBMT software released under the terms of GNU General Public License. It is available in 35 languages and it is still under development. It was originally designed for languages closely related to Spanish [4]. The image below is an illustration of the Apertium’s pipeline.

b6vUbuf.png!web

Apertium pipeline — Photo by Darkgaia1 on Wikipedia

GramTrans is a cooperation of a company based in Denmark and a company based in Norway and it offers machine translation for Scandinavian languages [5].

Advantages

  • No bilingual text required
  • Domain-independent
  • Total control (a possible new rule for every situation)
  • Reusability (existing rules of languages can be transferred when paired with new languages)

Disadvantages

  • Requires good dictionaries
  • Manually set rules (requires expertise)
  • The more the rules the harder to deal with the system

Statistical Machine Translation

This approach uses statistical models based on the analysis of bilingual text corpora. It was first introduced in 1955 [6], but it gained interest only after 1988 when the IBM Watson Research Center started using it [7, 8].

The idea behind statistical MT is the following:

Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(S|T).

A Statistical Approach to Machine Translation , 1990. [8]

Using Bayes’ theorem, we can transform this maximisation problem to the product of Pr(S) and Pr(T|S), where Pr(S) is the language model probability of S (S is the right sentence in that place) and Pr(T|S) is the translation probability of T given S. In other words, we are seeking the most likely translation given how correct a candidate translation is and how well it fits in the context.

aiiemeQ.png!web

Therefore, an SMT requires three steps: 1) a Language Model (what is the correct word given its context?); 2) a Translation Model (what is the best translation of a given word?); 3) a method to find the right order of words.

In the previous paragraphs, we used both sentence and word as the unit of translation. The most used model is somewhere between these. It’s called phrase -based translation. For example, the English phrase “is buying” is translated to “achète” in French.

Ab2uAzA.png!web

Non-factored and factored translation — Figures from Moses: Open Source Toolkit … [9]

SMT examples

Advantages

  • Less manual work from linguistic experts
  • One SMT suitable for more language pairs
  • Less out-of-dictionary translation: with the right language model, the translation is more fluent

Disadvantages

  • Requires bilingual corpus
  • Specific errors are hard to fix
  • Less suitable for language pairs with big differences in word order

Neural Machine Translation

The neural approach uses neural networks to achieve machine translation. Compared to the previous models, NMTs can be built with one network instead of a pipeline of separate tasks.

In 2014, sequence-to-sequence models were introduced opening new possibilities for neural networks in NLP. Before the seq2seq models, the neural networks needed a way to transform the sequence input into computer-ready numbers (one-hot encoding, embeddings). With seq2seq, the possibility of training a network with input and output sequences became possible [10, 11].

MjYJJvj.png!web

LSTM based seq2seq model: a) training phase, b) prediction phase

The NMT emerged quickly. After a few years of research, these models outperformed the SMTs [12]. With the improved results, many translator provider companies changed their networks to neural-based models including Google [13] and Microsoft.

A problem with neural networks occurs if the training data is unbalanced, the model cannot learn from the rare samples as well as frequent ones. In the case of languages, it is a common problem as there are many rare words used only a few times on the entire Wikipedia for example. To train a model that does not biased towards frequent words (e.g.: multiple occurrences on every Wikipedia pages) can be challenging. A recent paper proposes a solution using a post-processing step to translate these rare words with a dictionary [14].

Recently, Facebook researchers introduced an unsupervised MT model working with both SMT and NMT that requires only large monolingual corpora and not a bilingual one [15]. The main bottleneck of the previous examples was the lack of large database with translations to train on. This model shows promise to resolve this issue.

NMT examples

Advantages

  • End-to-end models (no pipeline of specific tasks)

Disadvantages

  • Requires bilingual corpus
  • Rare word problem

UzU7veM.png!web

Translation quality of statistical and neural MT models by Google — Figure by Google

Summary

In this story, we covered the three approaches to the problem of Machine Translation. Many important publications are collected alongside important applications. The story revealed the history of the field and collected literature of the state-of-the-art models. I hope that it is a good start for someone new to the field.

If you think that something is missing, feel welcomed to share it with me!

References

[1] Toma, P. (1977, May). Systran as a multilingual machine translation system. In Proceedings of the Third European Congress on Information Systems and Networks, Overcoming the language barrier (pp. 569–581).

[2] Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., … & Enoue, S. (2016). Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540 .

[3] Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 .

[4] Corbí Bellot, A. M., Forcada, M. L., Ortiz Rojas, S., Pérez-Ortiz, J. A., Ramírez Sánchez, G., Sánchez-Martínez, F., … & Sarasola Gabiola, K. (2005). An open-source shallow-transfer machine translation engine for the Romance languages of Spain.

[5] Bick, Eckhard (2007), Dan2eng: Wide-Coverage Danish-English Machine Translation , In: Bente Maegaard (ed.), Proceedings of Machine Translation Summit XI, 10–14. Sept. 2007, Copenhagen, Denmark . pp. 37–43

[6] Weaver, W. (1955). Translation . Machine translation of languages , 14 , 15–23.

[7] Brown, P., Cocke, J., Pietra, S. D., Pietra, V. D., Jelinek, F., Mercer, R., & Roossin, P. (1988, August). A statistical approach to language translation. In Proceedings of the 12th conference on Computational linguistics-Volume 1 (pp. 71–76). Association for Computational Linguistics.

[8] Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., … & Roossin, P. S. (1990). A statistical approach to machine translation. Computational linguistics , 16 (2), 79–85.

[9] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., … & Dyer, C. (2007, June). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180).

[10] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks . In Advances in neural information processing systems (pp. 3104–3112).

[11] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 .

[12] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 .

[13] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 .

[14] Luong, M. T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2014). Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206 .

[15] Lample, G., Ott, M., Conneau, A., Denoyer, L., & Ranzato, M. A. (2018). Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 .

[16] Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 .

Learn NMT with BERT stories

  1. BLEU-BERT-y: Comparing sentence scores
  2. Visualisation of embedding relations (word2vec, BERT)
  3. Machine Translation: A Short Overview

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK