In automated text translation, Facebook — now Meta — leads the way. On July 8, 2022, the social media giant (which also owns Instagram and WhatsApp) announced that it had developed an artificial intelligence program capable of supporting translation into 200 languages. This is double the previous record set by Microsoft’s algorithm. Meta’s numerical model, called NLLB-200 (for example, “No language is forgotten”), takes into account rare languages such as Lao (spoken in Laos), Kamba, Fulani, or Lingala (spoken in Africa). They have little or no integration with previous translation software, yet they are used by millions of speakers. At least 45 million people speak Lingala in the DRC (Democratic Republic of the Congo), the Republic of the Congo, the Central African Republic and South Sudan. However, Meta notes, there are only 3,260 Wikipedia articles in Lingala; compared to, for example, 2.5 million articles in Swedish, which is spoken by “only” 10 million people in Sweden and Finland! This is the essence of the NLLB-200: to offer better access to web content to the billions of people who until now had no access to it due to the language barrier. If Meta offers its program in the public domain, which allows any actor to take possession of it, then the company created by Mark Zuckerberg is directly interested in developing this type of service, Facebook makes almost 20 billion daily broadcasts on its news feed … Antoine Bordes, director of the Paris Artificial Intelligence Center Meta (FAIR, Facebook Artificial Intelligence Research), a lab whose researchers were at the forefront of developing the NLLB-200, answers questions from Science et Avenir about the development of the model, the culmination of six years of research into AI-assisted translation. And reminds me of how this might fit into the metaverse scenario in the future.
“A thousand billion parameters!”
Sciences et Avenir: NLLB-200 program supported by supercomputer. Why ?
Antoine Bordes: To provide automatic translation in 200 languages, you need to process a significant amount of data. About a trillion parameters! This is huge, even if this data can be divided into three categories.
What are these three data types?
First, this is – and this is a smaller part – translation data. In fact, all publicly available translations of the languages we deal with were included in our model. These are texts translated by people.
Then there is the “mono-lingual” text, not translated, which exists only in its original version. These may be texts in English, French, Italian… but also in Zulu, Assamese or Bengali (languages of South Africa, Indo-European and Bangladesh respectively, editor’s note). We will look for them on the Internet to feed the model; in addition, the question arises as to what language it is!
Finally, the third type of data corresponds to a process created by Meta and whose code is open: Laser 3 will scour the Internet looking for phrases in different languages that mean the same thing. We are talking about translated texts in which matches were found automatically. I insist: these are not translations written by translators, but “approximate parallel texts” that our system evaluates, saying: “Yes, they are the same.” Think about press reports: when an international event happens, it gets worldwide coverage. Also in the texts of journalists will be included sentences or expressions that can be found in all languages, and between which Laser 3 can match.
“The scale of the data is the same as that of the multilingual network”
So the NLLB-200 model is based on real human translations, monolingual texts, and approximate translations?
Yes, the data scale is the same as in the multilingual network. Hence the need to use a huge calculator in order to have a model that can swallow it all, grind it all, learn it and, ultimately, first translate it into more languages, but also with better quality. Indeed, in languages already covered by automatic translation, an average improvement of 44%.
This type of assessment, which validates the quality of a translation, is carried out using an assessment tool called FLORES-200. However, it was also developed by the Facebook AI Lab. Aren’t you the judge and the party?
Firstly, this is not a desire on our part to develop an assessment tool on our own, but a necessity. There was simply no such device covering rare languages. This is difficult to implement: you need to find speakers who can translate. We had to hire translators who worked in two separate teams: one translates the text, the other evaluates the translation thus obtained. It’s a way to eliminate bias. But in fact, Meta is developing a validation tool as well as a model: aren’t we congratulating ourselves? The question is legal. We answer this by making everything open source. Everything is published, this is our justice of the peace. Our scientific work is huge, it is full of details, but it is accessible. Meta is really open to improvements and criticism from a purely scientific point of view, in terms of reproducibility and peer review.
“The Internet will evolve from 2D to 3D”
How does automatic translation fit into the metaverse scenario, this evolution of the Internet driven by Facebook and the Meta?
Very central. We need an inclusive metaverse that is a source of opportunity for as many people as possible. One of the levers is to overcome the language barrier. The metaverse must be polyglot at its core. Imagine participating in a virtual meeting in sub-Saharan Africa. With automatic translation, you can talk to everyone without even practicing the language that the participants mainly speak. The Metaverse must create an environment in which we can speak, exchange opinions, engage in dialogue, including giving a chance to people who do not speak English well; after all, 80% of the information available on the Web is in this language!
NLLB-200 is focused on written text. But Meta has another related project: Universal Speech Translator, real-time voice-to-voice translation. Keeping the accents, the pauses, the prosody… everything that forms the basis of the conversation. The two projects completely complement each other – their teams work closely – therefore, on the one hand, we are expanding the number of languages into which automatic translation is carried out, and on the other hand, we are considering the topic of “speech to speech”. Eventually, in the long term, no doubt we will be able to make 200 languages voice-to-voice in real time. It will be the metaverse.
The meta is very voluntary in this matter… but aren’t you the only ones who believe this?
I do not think so. Take a look at the recent Vivatech show: not only did the Meta booth talk about the metaverses, it’s far from it! If the task is to say that we risk being there alone, I don’t care at all. Many players believe that the Internet will evolve from 2D to 3D, with an immersive side that is critical for office automation, entertainment, work or play. We are convinced that this is the future of the Internet and, in particular, the future of communication, social projection and online relationships. We believe in it and we are going there.