There are thousands of languages spoken on Earth. Meta wants to understand them all, and is slowly working to achieve that goal.
The company’s artificial intelligence unit built a model, NLLB-200, that the company claims is the first to translate 200 different languages with state-of-the-art quality that has been validated through extensive evaluations for each.
The team has also created an evaluation dataset, FLORES-200, and measured NLLB-200’s performance in each language to confirm the translations are of high quality. NLLB-200 exceeds the previous state of the art by an average of 44%, the company notes.
Meta said it will open-source the NLLB-200 model and publish research tools to enable others to extend this work to more languages and build more inclusive technologies. Meta AI is also providing up to $200,000 of grants to nonprofit organizations for real-world applications for NLLB-200.
The effort, for Meta, is tied to the metaverse and the ability for people to communicate beyond their native language.
Meta AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world’s languages.
Many of the languages, such as Kamba and Lao, were very little supported or not supported at all by even the best existing translation tools today.
Fewer than 25 African languages are currently supported by widely used translation tools, according to the group. NLLB-200 supports 55 African languages.
Researchers believes NLLB will support more than 25 billion translations served every day on Facebook News Feed, Instagram, and our other platforms.
Modeling techniques and learnings from the NLLB research also are being applied to translation systems used by Wikipedia editors.
Meta partnered with the Wikimedia Foundation -- the nonprofit organization that hosts Wikipedia and other free knowledge projects -- to help improve translation systems on Wikipedia.
Wikipedia versions exist in more than 300 languages, but most have far fewer articles than the 6 million available in English.
For example, there are around 3,260 Wikipedia articles in Lingala, a language spoken by 45 million people in the Democratic Republic of the Congo, Republic of the Congo, Central African Republic, and South Sudan. Contrast that with a language like Swedish, which has 10 million speakers in Sweden and Finland and more than 2.5 million articles.
Wikipedia editors are now using the technology behind NLLB-200 to translate articles in more than 20 low-resource languages, including 10 that previously were not supported by any machine translation tools on the platform.