Translation algorithm by Google
The analyzing network "reads" the sentence not only from left to right, but also from right to left - this allows you to fully understand the context.
Separately, it forms a module of attention, with the help of which the second stream understands the value of individual semantic fragments.
In the neural system, the smallest element is not a word, but fragments of a word. This allows you to focus computing power not on word forms, but on the context and meanings of the sentence. GNMT uses about 32,000 of these chunks. According to the developers, this allows for high speed and accuracy of translation without consuming excessive computing power.
Fragment analysis greatly reduces the risks of inaccurate translation of words and phrases with various suffixes, prefixes and endings.
The self-learning system allows the neural network to translate with high accuracy even those concepts that are absent in generally accepted dictionaries - slang, jargon or neologisms.
But that is not all. The neural network can also work letter by letter. For example, when transliterating proper names from one alphabet to another.
Statistics: has it really gotten better?
2 years have passed since the launch of the GNMT system, so the results can be assessed.
Why now? The fact is that the neural system works without an installed database, and it takes time to build and correct translation methods.
For example, setting up a machine translation model that uses statistical methods takes 1 to 3 days. At the same time, building a neural model of the same size will take more than 3 weeks.
It is noteworthy that with an increase in the base, the time for processing a statistical model grows in an arithmetic progression, and for a neural network - in a geometric progression. The larger the base, the larger the time gap, just like in this example https://gglot.com/.
And if we consider that Google Translate works with 10,000 language pairs (103 languages), then it is clear that adequate results can be summed up only now.
In November 2016, after the complete completion of the system training and the official launch of the analytics, Google presented a detailed analysis of the GNMT results. It follows that improvements in translation accuracy are insignificant - on average 10%.
The most popular language pairs, such as Spanish-English or French-English, showed the largest gains, with an accuracy score of 85–87%.
In 2017, Google conducted large-scale surveys of Google Translate users: they were asked to rate 3 translation options: machine statistical, neural, and human. Here the results were more interesting. Translation using neural networks in some language pairs turned out to be very close to human translation.
As you can see, the quality of translation in English-Spanish and French-English language pairs is practically human. But this is not strange, because it was on these language pairs that deep learning of algorithms took place.
Here are the same results plotted so that you can clearly see the difference with standard machine translation.
Other language pairs
With other language pairs, the situation is not so rosy, but there is no large-scale research on them. Nevertheless, if with languages similar in structure, neural translation works quite well, then with radically different language systems (for example, Japanese and Russian), translation is noticeably inferior to human translation.
It should be noted that the developers did not try to achieve maximum translation accuracy when launching the neural network. This is because it would require complex heuristic structures, and this would greatly reduce the speed of the system. The developers have tried to find a balance between accuracy and speed. In our subjective opinion, they did it. Reference: Wiki