Google is getting more multilingual, but will it get the nuance?

LIMA, Peru (AP) — About 10 million people speak Quechua, but trying to automatically translate emails and text messages into the most spoken indigenous language family in the Americas has long been nearly impossible.

That changed on Wednesday, when Google added Quechua and a variety of other languages ​​to its digital translation service.

The internet giant says new artificial intelligence technology is allowing it to dramatically expand Google Translate’s repertoire of world languages. It added 24 this week, including Quechua and other indigenous South American languages ​​such as Guarani and Aymara. It also adds a number of widely spoken African and South Asian languages ​​that were missing from popular tech products.

“We looked at languages ​​with very large underserved populations,” Google researcher Isaac Caswell told reporters.

News from the California company’s annual I/O Technology Showcase can be celebrated in many corners of the world. But it will also likely draw criticism from those frustrated with previous tech products who haven’t understood the nuances of their language or culture.

Quechua was the lingua franca of the Inca Empire, which stretched from what is now southern Colombia to central Chile. Its status began to decline following the Spanish conquest of Peru more than 400 years ago.

Adding it to Google’s recognized languages ​​is a big win for Quechua language activists like Luis Illaccanqui, a Peruvian who created the Qichwa 2.0 website, which includes dictionaries and resources for learning the language.

“It will help put Quechua and Spanish on equal status,” said Illaccanqui, who was not involved in Google’s project.

Illaccanqui, whose last name in Quechua means “you are lightning,” said the translator will also help keep the language alive with a new generation of young people and teenagers, “who speak Quechua and Spanish in same time and are fascinated by social media.”

Caswell called the news a “tremendous technological leap forward” because until recently it was not possible to add languages ​​if researchers could not find a sufficient amount of text online – such as books. digital, newspaper or social media posts – for their AI systems to learn.

US tech giants have not had a proven track record of operating their language technology well outside of wealthier markets, a problem that has also made it harder for them to detect dangerous misinformation on their platforms. Until this week, Google Translate was offered in European languages ​​such as Frisian, Maltese, Icelandic and Corsican – each with less than a million speakers – but not East African languages. such as Oromo and Tigrinya, which have millions of speakers.

The new languages ​​will be rolled out this week. They will not yet be understood by Google’s voice assistant, which is currently limiting them to text-to-text translations. Google said it was working on adding voice recognition and other features, such as the ability to translate a sign by pointing a camera at it.

This will be important for widely spoken languages ​​like Quechua, especially in healthcare, as many Peruvian doctors and nurses who only speak Spanish work in rural areas and “are unable to understand patients who speak mainly Quechua,” Illaccanqui said.

“The next frontier, or challenge, is to work on speech,” said Arturo Oncevay, a Peruvian machine translation researcher at the University of Edinburgh who co-founded a research coalition to improve indigenous language technology across Americas. “The indigenous languages ​​of the Americas are traditionally oral.”

In its announcement, Google warned that the quality of translations in the newly added languages ​​”still lags far behind” other languages ​​it supports, such as English, Spanish and German, and noted that models “will make mistakes and show their own biases”. But the company only added languages ​​if its AI systems reached a certain threshold of proficiency, Caswell said.

“If there’s a significant number of cases where it’s very wrong, then we won’t include it,” he said. “Even if 90% of the translations are perfect, but 10% are nonsense, it’s a bit too much for us.”

Google said its products now support 133 languages. The latest 24 are the largest batch to be added since Google incorporated 16 new languages ​​in 2010. What made the expansion possible is what Google calls a “zero-shot” or “zero-resource” machine translation model. – a model that learns to translate into another language without ever seeing an example of it.

Facebook and Instagram’s parent company, Meta, introduced a similar concept called Universal Speech Translator last year.

Google’s model works by training a “single gigantic neural AI model” on about 100 data-rich languages, then applying what it learned to hundreds of other languages ​​it doesn’t know, said Caswell. “Imagine being a great polyglot and just starting to read novels in another language, you can start to piece together what that might mean based on your knowledge of the language in general,” he said. .

He said the new group ranged from smaller languages ​​like Mizo, spoken in northeast India by around 800,000 people, to more widely spoken languages ​​like Lingala, spoken by around 45 million people in across central Africa.

More than 15 years ago, in 2006, Microsoft captured attention in South America with a software feature translating familiar Microsoft menus and commands into Quechua. But that was before the current wave of AI advances in real-time translation.

Américo Mendoza-Mori, a language specialist at Harvard University who speaks Quechua, said Google’s attention is bringing needed visibility to the language in places like Peru, where Quechua speakers are still lacking in many public services. The survival of many of these languages ​​”will depend on their use in digital contexts”, he said.

Another language specialist, Roberto Zariquiey, said he was skeptical of Google’s ability to create an effective language revitalization tool for Quechua, Aymara or Guarani without closer involvement of community groups. of the region.

“Languages ​​are deeply linked to lives, cultures, ethnic groups and political organizations,” said Zariquiey, a linguist at the Pontifical Catholic University of Peru. “That should be taken into account.”


New languages ​​added are: Assamese, Aymara, Bambara, Bhojpuri, Dhivehi, Dogri, Ewe, Guarani, Ilocano, Konkani, Krio, Lingala, Luganda, Maithili, Meiteilon (Manipuri), Mizo, Oromo, Quechua, Sanskrit, Sepedi, Sorani Kurdish, Tigrinya, Tsonga and Twi.


O’Brien reported from Providence, Rhode Island.

Comments are closed.