Statistical Machine Translation

Posted on | January 7, 2008 | 2 Comments

Back in last October, Google Translate completely switched to its own home-grown translation software, adopting the Statistical Machine Translation approach.

Google Translate had been using SYSTRAN as the underlying translation engine, the same software Babel Fish uses. In the past, I used to get the same translation results from both engines, so I did not pay too much attention to Google Translate.

With the traditional rules-based approach, a lot of work is required by linguists to define vocabularies and grammars. With the Statistical Machine Translation approach, billions of words of text are fed into the engine, both original texts and their human translations. Statistical learning techniques are then applied to build a translation model. It is claimed that very good results were achieved in research evaluations.

Here is the original French text quoted from my past post last year on machine-translation humours:

Les Chinois qui ont dû payer une taxe d’entrée à leur arrivée au Canada ont reçu jeudi les excuses officielles du gouvernement canadien.

Here is the translation from the Fish:

The Chinese who had to pay a tax of entry to their arrival in Canada received Thursday the official excuses of the Canadian government.

Here is the translation from the new Google Translate engine:

The Chinese who had to pay an entrance fee upon their arrival in Canada have received formal apology Thursday from the Canadian government.

Not bad at all.

Comments

2 Responses to “Statistical Machine Translation”

  1. Ilya L
    January 14th, 2008 @ 3:38 pm

    Following your links I learned more about the machine translation. Thank you Edwin.

  2. Edwin
    January 14th, 2008 @ 3:46 pm

    You are welcome, Ilya.

Leave a Reply





  • Subscribe

  • Recent Posts

  • Posts by Categories