Microsoft and Nvidia break records with a neural network that mimics human language

Artificial intelligence, in its largest and most ambitious applications, requires significant computing power, hence the need to use powerful networked computers to increase their total capacity. In an unrivaled feat, the giant Microsoft and the famous chip and graphics card maker Nvidia have created a vast artificial intelligence capable of mimicking human language more convincingly than ever.

In their common development, the two information technology giants quickly ran into the first mentioned problem: the more ambitious a project based on artificial intelligence, the more it will be limited by technological infrastructure than by theory. In addition, there is the cost and time required to set up a system of this magnitude. In other words, the possibility of developing such AI is currently severely limited by these obstacles.

The project consisted of developing a gigantic neural network, with more than 530 billion parameters! This system, called Megatron-Turing Natural Language Generation (MT-NLG), thus has more than three times the parameters of OpenAI’s revolutionary GPT-3 neural network, considered until now as the richest at this level.

Unfortunately, a project that consumes too much energy, is expensive and time-consuming.

Since we are talking about costs and time, it should be noted from the beginning that this development required more than a month of work on a supercomputer equipped with about 4,500 very powerful (and therefore expensive) graphics cards, which are generally used for running high -End of neural networks.

When OpenAI released GPT-3 last year, it amazed researchers with its ability to generate fluid text streams. It did so using 175 billion parameters, mapped data locations within a computer that mimic synapses between neurons in the brain, as well as vast amounts of publicly accessible text from which it learned language patterns. Since then, Microsoft has acquired an exclusive license to use GPT-3.

But the company wanted to make it bigger and better. When Microsoft and Nvidia tested MT-NLG on a series of language tasks, such as predicting the word that follows a section of text and extracting logical information from the text, they found that it was slightly better than GPT. -3 to complete sentences accurately and imitate common sense reasoning. At a benchmark where the AI ​​must predict the last word of a sentence, GPT-3 achieved 86.4% accuracy, while the new AI achieved 87.2%.

Evolution of the size (in billions of parameters) of different advanced NLP (natural language processing) models over time. © Nvidia

This small difference would simply be due to the greater number of neurons (parameters). And it’s far from cheap… “It costs millions of dollars to train one of these models because the computing resources required for this purpose increase rapidly with the size of the model,” explains Bryan Catanzaro of Nvidia.

MT-NLG was trained using Nvidia’s Selene supercomputer, consisting of 560 high-performance servers, each equipped with eight 80Gb Tensor Core A100 graphics processing units (GPUs). Each of these 4,480 graphics cards, initially Designed for video games but also extremely capable of processing large amounts of data while AI training, it currently costs thousands of dollars to trade. Although this research team did not use the full power of the computer, it took more than a month to train the AI.

Even running the neural network once trained still requires 40 of these GPUs, with each request taking 1-2 seconds to process. This constant expansion of scale means that AI research is now, to some extent, an engineering problem to effectively divide the problem and distribute it among large amounts of hardware.

When scale reaches the cost ceiling …

Catanzaro says that scale has been the dominant force in machine learning for decades. “It’s absolutely true that better algorithms help, and it’s 100% true that more data and better data absolutely help, but I think the scale of computing has really been the driving force behind a lot of advancements in this area,” he says.

Of course, many researchers are reluctant to rely solely on scaling, wanting a more elegant solution, especially since the baseline measurements reflect small improvements. However, other researchers believe that there is a significant advancement in the way that AIs reason and extract nuanced information simply by scaling up systems.

Samuel Bowman of New York University believes that current criteria for assessing the quality of AI language processing are nearing the end of their useful life and that researchers are looking for new metrics that can be used to assess the quality of AI. language and even reasoning. These same researchers are also waiting “nervously” to see if the scale can continue to improve or if it will reach a ceiling, he says, as the cost of research in this area is rising rapidly.

“These are undoubtedly some of the most expensive projects in this area, but whether they are too expensive depends on how their potential is perceived,” he explains. “If you see them as steps toward some pretty useful form of artificial intelligence, and you see it as desirable, then it’s easy to imagine that much larger budgets are justified.”

“The quality and results we have achieved today are a huge step forward in realizing all the promises of natural language AI. The innovations from DeepSpeed ​​and Megatron-LM will benefit the development of current and future AI models and will make training large AI models cheaper and faster, ”the researchers write in the Nvidia statement. Therefore, the new AI models enabled by such infrastructure could also help to make them faster and consume less power, thus allowing them to reduce their size.

PHP Script, Elementor Pro Weadown, WordPress Theme, Fs Poster Plugin Nulled, Newspaper – News & WooCommerce WordPress Theme, Wordfence Premium Nulled, Dokan Pro Nulled, Plugins, Elementor Pro Weadown, Astra Pro Nulled, Premium Addons for Elementor, Yoast Nulled, Flatsome Nulled, Woocommerce Custom Product Ad, Wpml Nulled,Woodmart Theme Nulled, PW WooCommerce Gift Cards Pro Nulled, Avada 7.4 Nulled, Newspaper 11.2, Jannah Nulled, Jnews 8.1.0 Nulled, WP Reset Pro, Woodmart Theme Nulled, Business Consulting Nulled, Rank Math Seo Pro Weadown, Slider Revolution Nulled, Consulting 6.1.4 Nulled, WeaPlay, Nulledfire

Back to top button