An Inside Look at Google Translate

Do you ever wonder what the driving force behind the giant phenomena Google Translate is? Google has come a long way from being known for their one service – search engine provider – to basically bridging the gap between languages with its new technology.

Google’s machine translation (MT) team is lead by Franz Josef Och, who has been the powerhouse behind much of the company’s progress on the technology. The following are a few questions from a recent interview with Och.

How did Google figure out so early that it was going to be important to be able to translate the Web?

When I joined Google, I actually talked to Larry [Page] about that on the phone, because I was concerned about why Google would do MT — it’s a search engine company. He emphasized that it’s really core to the mission of Google, and not just a side thing where if times get hard, then MT will [fall by the wayside]. But people are very serious at Google about the mission and trying to achieve it.

How close are you to making that a reality?It’s a hard question. In some sense, I believe we’ve made progress, and this is an exciting time for MT in the research community at large, but also here at Google. MT gets a lot more traction, more people are using it and it gets integrated into many different products. But on the other hand, there’s obviously still a lot of work ahead of us. What we’re doing is working on the core quality of machine translation.

How often do you add new languages to Google Translate?

Since October 2007, about every quarter or two, we’ve added a significant new number of languages.

Where do you get all of those translations from?

When we started, there were standard test sets provided by the Linguistic Data Consortium, which provides data for research and academic institutes. Then there are places like the United Nations, which have all their documents translated into the six official languages of the United Nations. And there’s a vast pool of documents available there in the database, which has been very useful because the translation quality has been very good.

But then otherwise, it’s kind of ‘the Web.’ Where all the documents that are on the Web that are translated contribute to learning translation for our algorithms. On the Web, the quality of the translation might not always be so good, so it’s a very interesting and challenging research problem in itself to find all the translations and learn from the potentially noisy translations out there.Our algorithms basically mine everything that’s out there.

Read the full interview and let us know what your thoughts are about Google.